Skip to content

16 subtitle conversion library#17

Open
sjiamnocna wants to merge 15 commits intomasterfrom
16-subtitle-conversion-lib
Open

16 subtitle conversion library#17
sjiamnocna wants to merge 15 commits intomasterfrom
16-subtitle-conversion-lib

Conversation

@sjiamnocna
Copy link
Copy Markdown
Collaborator

@sjiamnocna sjiamnocna commented Jul 22, 2025

#16

Streaming Subtitle Conversion Feature

Overview

This Merge Request introduces support for streaming subtitle conversion. The new functionality enables the system to accept a stream of subtitle items and begin generating valid subtitle output (SRT, VTT, plain text, etc.) in real time, as the video is being transcribed. This allows users to receive subtitle data incrementally, improving responsiveness and user experience for long or live videos.

Why Incremental Subtitles?

While most users typically want the complete, finalized subtitle file (e.g., for download, archiving, or sharing), there are important scenarios where incremental or partial subtitle output is valuable:

  • Live or long-form content: For live streams, webinars, or lengthy videos, users may want to download subtitles as the content progresses, rather than waiting until the end. This is especially helpful for accessibility or for viewers joining a live event in real time.
  • Real-time feedback: Developers, editors, or QA testers might want to see how subtitles are being generated as the video is processed, to catch errors or make adjustments on the fly.
  • Progressive download/display: Some platforms allow users to start watching (and reading subtitles) before the entire video or subtitle file is ready—think of live captions or instant streaming features.
  • Partial data recovery: If a stream is interrupted, having the partial subtitles up to that point can be better than losing everything.

For most users, the default remains to wait for the whole subtitle file. The streaming feature is optional and designed to enhance flexibility for advanced or real-time use cases.

Motivation

  • Faster feedback: Users can see subtitles as soon as they are available, without waiting for the entire transcription to finish.
  • Improved UX: Enables real-time subtitle display and download for live or long-form content.
  • API flexibility: Supports both batch and streaming workflows for subtitle consumers.

Key Features

  • Stream-based processing: Accepts a stream (generator/iterator) of strongly typed subtitle items.
  • Incremental output: Generates valid output in the requested format (SRT, VTT, plain text, etc.) as new subtitle items arrive.
  • Type-safe and Effect-based: All streaming logic is implemented using EffectTS, ensuring type safety and robust error handling.
  • Environment agnostic: No environment-specific APIs are used; works in browser, Node.js, Bun, Cloudflare Workers, and AWS Lambda.
  • Format support:
    • SRT (SubRip)
    • VTT (WebVTT)
    • Plain text
    • JSON (for completeness)
  • Error handling:
    • Typed errors for malformed input, unsupported formats, or missing fields
    • Errors are catchable and integrate with existing error handling patterns

Usage Example

// Example: Streaming SRT output as subtitles are transcribed
const subtitleStream = getSubtitleStream(); // Generator or async iterator of SubtitleItem
for await (const srtChunk of streamSubtitleConversion(subtitleStream, 'srt')) {
  // srtChunk is a valid SRT fragment for the current subtitle item
  sendToClient(srtChunk); // e.g., push to websocket or HTTP stream
}
  • The same pattern applies for 'vtt', 'plain-text', or 'json' formats.
  • The output is always a valid fragment that can be concatenated to form a complete file.

API Changes

  • New streaming conversion functions:
    • runSubtitleProcessingStream and runSubtitleConversionStream now accept streams/generators and output incremental results.
  • Backward compatible:
    • Existing batch conversion APIs are unchanged.
    • Streaming is opt-in and does not break current consumers.

Testing

  • Unit and integration tests for streaming conversion in all supported formats
  • Tests for error handling with malformed or incomplete input
  • Tests for environment agnosticism (no Node.js/browser-specific APIs in core logic)
  • Tests for correct output structure and incremental validity

Documentation

  • JSDoc and code comments for all new streaming functions
  • Usage examples for both batch and streaming workflows
  • Error types and handling examples

Checklist

  • Accepts stream/generator of strongly typed subtitle items
  • Outputs valid subtitle fragments in requested format as data arrives
  • Type-safe and Effect-based implementation
  • Environment agnostic (no platform-specific APIs)
  • Typed error handling for all failure cases
  • Comprehensive tests for streaming and error scenarios
  • Documentation and usage examples updated

This feature enables real-time subtitle delivery and is ready for review and integration. Please provide feedback or request additional scenarios if needed.

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Subtitle Conversion Function Implementation

4 participants