Skip to content

mssoftjp/ai-transcriber-cli

Repository files navigation

ai-transcriber-cli

A Go CLI that transcribes audio and video files via the OpenAI speech-to-text API and writes results as txt, md, json, srt, or vtt.

Installation

From release archive (recommended, macOS)

base="https://github.com/mssoftjp/ai-transcriber-cli/releases/latest/download"
curl -L -O "$base/checksums.txt"
archive="$(awk '/darwin_arm64\.tar\.gz$/ {print $2; exit}' checksums.txt)"
curl -L -O "$base/$archive"
shasum -a 256 -c checksums.txt --ignore-missing
tar -xzf "$archive"
mkdir -p "$HOME/.local/bin"
install -m 0755 "${archive%.tar.gz}" "$HOME/.local/bin/transcriber"
export PATH="$HOME/.local/bin:$PATH"
transcriber version

If you prefer a manual install, download the latest archive from GitHub Releases, extract it, and place transcriber on your PATH.

Install ffmpeg (recommended)

ffmpeg and ffprobe are needed for video input, long-file chunking, trimming, and format normalization. Small audio files in a provider-compatible format can be transcribed without them.

When ffmpeg is used, the CLI now produces provider-ready intermediate audio as compact .m4a files rather than large uncompressed WAV files. If you keep the workdir, the files left behind are the same files that were uploaded.

brew install ffmpeg          # macOS (https://formulae.brew.sh/formula/ffmpeg)
sudo apt install ffmpeg      # Debian / Ubuntu

Windows: download from ffmpeg.org/download.html and add the bin directory to your PATH.

Set your API key

export OPENAI_API_KEY="sk-..."

You can also store the key in the OS keychain:

printf '%s' "$OPENAI_API_KEY" | transcriber config key set --method keychain --stdin
transcriber config key status

The CLI resolves the API key from the configured [api].key_env, then OPENAI_API_KEY, then the OS keychain, and finally an optional local key file at the same config location (key.txt). It never writes keys to config files, log files, or transcript output. Audio data is sent to the OpenAI API for transcription and is subject to OpenAI's data usage policies. No audio or transcript data is sent anywhere else.

Verify

transcriber doctor

This checks API key visibility, ffmpeg / ffprobe availability, temp directory access, provider connectivity, and config validity.

From source

git clone https://github.com/mssoftjp/ai-transcriber-cli.git
cd ai-transcriber-cli
make install
transcriber version

Basic Usage

# Simplest form — writes Markdown output next to the input file
transcriber transcribe input.m4a

# Print plain text to stdout
transcriber transcribe input.m4a --format txt --stdout --events none

# Write JSON to a specific directory
transcriber transcribe input.m4a --format json --out-dir ./out

By default, the output file is written next to the input as <name>.transcript.md. A manifest sidecar (<name>.transcript.manifest.json) is also created. Use --out or --out-dir to change the destination, and --overwrite to allow replacing existing output.

Supported input formats include .mp3, .m4a, .wav, .flac, .ogg, .mp4, .mov, .mkv, and others. Run transcriber transcribe --help for the full list. Long files are automatically split into chunks and reassembled.

If a long client-chunked job fails partway through, re-run the same command with --resume to reuse completed chunks from the manifest sidecar and chunk cache next to the output artifacts.

For long client-chunked jobs with gpt-4o-transcribe or gpt-4o-mini-transcribe, you can add --parallel to send chunks concurrently. This speeds up long runs, but it disables prompt carryover for those chunks.

Choosing a Model

Model Strengths Good for
gpt-4o-transcribe High accuracy, preserves code-switched audio General transcription (default)
gpt-4o-mini-transcribe Lighter, lower cost Cost-sensitive workloads
whisper-1 Timestamp-capable output Subtitle generation (srt/vtt)
gpt-4o-transcribe-diarize Speaker-labeled output Meetings, multi-speaker recordings
# Generate subtitles
transcriber transcribe input.m4a --model whisper-1 --format srt

# Speaker diarization
transcriber transcribe call.m4a --diarize --format json

Language

The default is auto (automatic detection).

# Auto-detect — best for mixed-language audio
transcriber transcribe input.m4a --language auto

# Force a single language — suppresses other-language content
transcriber transcribe input.m4a --language ja

For audio that mixes multiple languages, auto tends to preserve the original speech more faithfully. Forcing a single language improves readability but may drop content in other languages.

Common Options

Time Range

transcriber transcribe input.m4a --start 30 --end 90
transcriber transcribe input.m4a --start 00:01:30 --end 00:03:00

Dictionary Corrections

A YAML dictionary file can automatically fix common recognition errors.

transcriber transcribe input.m4a --dictionary ./dict.yaml --dictionary-enabled

AI Postprocess

Runs a transcript-level correction pass after transcription. It does not summarize or translate.

transcriber transcribe input.m4a --postprocess

Faster Long-Form Runs

For gpt-4o-transcribe and gpt-4o-mini-transcribe, --parallel sends client chunks concurrently.

transcriber transcribe meeting.m4a --model gpt-4o-mini-transcribe --chunking-mode client --parallel

Notes:

  • --parallel is useful only when the execution plan uses client-side chunking
  • when the input fits in a single request or uses server-side chunking, --parallel has no effect
  • parallel chunk sending disables prompt carryover, so the default sequential mode remains the safer quality-first option
  • if you resume a partial client-chunked run, use the same --parallel setting as the original run

Pre-Flight Checks (dry-run / probe)

Inspect the execution plan without calling the API.

# probe: returns input metadata and the planned strategy as JSON
transcriber probe input.m4a

# dry-run: same entry point as transcribe, but stops after planning
transcriber transcribe input.m4a --dry-run --events none

JSONL Events (GUI / Automation)

transcriber transcribe input.m4a --events jsonl > events.jsonl

Emits machine-readable JSONL progress events to stdout. Designed for GUI wrappers and automation pipelines.

Interactive TUI

transcriber tui

The TUI is a helper for one-off local jobs. The batch CLI remains the primary interface for scripts, redirected output, and full option coverage. Use arrow keys or j/k to move, Enter to select or edit, s to start from the job screen, and Esc to go back or quit.

Configuration

A TOML config file lets you persist frequently used options as defaults.

  • macOS / Linux: ~/.config/transcriber/config.toml
  • Windows: %AppData%/transcriber/config.toml
# Generate a sample config
transcriber config init

# Validate the current config
transcriber config validate

Precedence: CLI flags > environment variables > config file > built-in defaults

API keys should be kept in environment variables. The CLI does not store, log, or embed API keys in any output. The --postprocess option sends transcript text (not audio) to the OpenAI API for correction; this is the only case where transcript content leaves the local machine after the initial transcription call.

See docs/config.md for the full reference.

Commands

Command Description
transcriber transcribe <input> Run transcription
transcriber probe <input> Inspect input and return the execution plan
transcriber doctor Check environment (API key, dependencies)
transcriber tui Open the interactive terminal UI for a single job
transcriber version Print version metadata
transcriber config init Print a sample config
transcriber config validate Validate config and dictionary

Run transcriber <command> --help for the full flag reference of each command.

Documentation

Development

make build    # build
make test     # test
make ci       # lint + test + vet
make hooks    # enable Git hooks (once)

Integration Tests

Tests that call the real API are not run by go test ./....

OPENAI_API_KEY=... go test ./internal/provider/openai ./internal/postprocess -run Integration -count=1

Packaging

Build a local binary:

make build

Build a versioned release archive plus checksums.txt:

make package VERSION=v0.4.0

Build a cross-target release archive:

make release-archive VERSION=v0.4.0 GOOS=darwin GOARCH=arm64

Packaging notes:

  • macOS and Linux archives are produced as .tar.gz
  • Windows archives are produced as .zip and contain transcriber_..._windows_amd64.exe

License

This project is licensed under the MIT License. See LICENSE.

About

CLI for transcribing audio and video files via the OpenAI speech-to-text API

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors