A voice input tool for macOS & Windows — trigger with a hotkey, speak, auto-paste.
- Lightweight: ~20 MB installer; ~120 MB idle on macOS packaged builds. Online ASR peaks around ~150 MB; local models load on demand and vary by model/backend.
- Data Security: All data stored locally; API keys are applied by the user, giving you full control.
- Fully Customizable: Ready-to-use defaults with all parameters exposed for fine-tuning.
- ASR Dual Engine (Online / Local): Online — ByteDance Doubao streaming ASR; Local — powered by sherpa-onnx, with CPU / CUDA / CoreML acceleration and model-dependent memory usage.
- LLM Support: Built-in support for 8 LLM providers — DeepSeek, OpenAI, Anthropic, Gemini, OpenRouter, SiliconFlow, Ollama, and OpenAI-compatible APIs.
- Streaming Output: For local models without native streaming, VAD-based segmentation with simulated streaming output delivers results in real time.
- Multi-Scenario Text Polishing: Built-in templates for general cleanup, translation, email drafting, and more — customizable prompts with per-template hotkey bindings.
- Hotwords: Multi-group hotword libraries to boost domain-specific term accuracy; automatically restores original formatting (capitalization, special characters, etc.), no manual corrections needed.
- Cross-Platform: macOS (Apple Silicon / Intel) and Windows.
- Customizable Hotkeys: Bind independent hotkeys for different scenarios (general, translation, formatting, etc.) with support for
toggle(press to start, press again to stop) andhold(hold to speak, release to stop) modes. - Enhanced Experience: Audio feedback sounds, real-time waveform animation.
- Apple Signed & Notarized: macOS builds are signed and notarized with an Apple Developer certificate — no Gatekeeper warnings on install (Windows builds are currently unsigned).
Go to GitHub Releases and download the latest version for your platform.
| Platform | Installer Filename |
|---|---|
| macOS (Apple Silicon) | VoicePaste_{version}_aarch64.dmg |
| macOS (Intel) | VoicePaste_{version}_x64.dmg |
| Windows (x64) | VoicePaste_{version}_x64-setup.exe / .msi |
| Type | Link |
|---|---|
| General Setup | EN / 中文 |
| Doubao Streaming ASR | EN / 中文 |
| Local Models | EN / 中文 |
| Type | Model | Guide | Size | Peak Memory (macOS Activity Monitor) | Languages | Streaming | Hotwords | Punctuation | ITN | Model ID |
|---|---|---|---|---|---|---|---|---|---|---|
| Online | Doubao Streaming ASR 2.0 | EN / 中文 | - | ~150 MB | Chinese + English mix, dialects | ✅️ | ✅️ | ✅️ | ✅️ | - |
| Local | SenseVoice | EN / 中文 | 158 MB | ~580 MB | ZH / EN / JP / KO / Cantonese | ☑️ Simulated streaming | ☑️ via LLM | ☑️ via punctuation model | ☑️ via LLM | sherpa-onnx-sense-voice-zh-en-ja-ko-yue-int8-2025-09-09 |
| Local | Zipformer (ZH + EN bilingual) | EN / 中文 | 150 MB | ~465 MB | Chinese + English | ✅️ | ✅️ | ☑️ via punctuation model | ☑️ via LLM | sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20 |
| Local | FunASR-Nano | EN / 中文 | 948 MB | ~2.5 GB | Chinese + English, 7 dialects | ☑️ Simulated streaming | ✅️ | ✅️ | ✅️ | sherpa-onnx-funasr-nano-int8-2025-12-30 |
| Local | Qwen3-ASR-0.6B | EN / 中文 | 938 MB | Not tested | 30 languages, Chinese dialects, lyrics, rap | ☑️ Simulated streaming | ✅️ | ✅️ | ✅️ | sherpa-onnx-qwen3-asr-0.6B-int8-2026-03-25 |
Notes
- ✅️ Native model capability, ☑️ Achieved through software composition
- Idle memory is ~120 MB on macOS packaged builds, measured from the Activity Monitor
Memorycolumn across the VoicePaste process group. Models are loaded on demand during recognition. - Local models without native streaming output use built-in VAD (Voice Activity Detection) for audio segmentation with simulated streaming; optional punctuation restoration model available.
- Memory data was measured locally on Mac mini (Apple Silicon). Results may vary with system load, macOS memory compression, restart state, backend, and cache state. See the performance test report for full details.
| Provider | Supported |
|---|---|
| OpenAI | ✅️ |
| DeepSeek | ✅️ |
| Anthropic | ✅️ |
| OpenRouter | ✅️ |
| SiliconFlow | ✅️ |
| Gemini | ✅️ |
| Ollama | ✅️ |
| OpenAI-Compatible | ✅️ |
VoicePaste requires Microphone and Accessibility permissions to function.
Microphone Permission
- Settings page → System Permissions → Click "Request Permission"
- System Settings → Privacy & Security → Microphone, ensure VoicePaste is authorized
- If previously denied, reset via Terminal and re-authorize:
tccutil reset Microphone com.yolanda.voicepasteAccessibility Permission
- System Settings → Privacy & Security → Accessibility, ensure VoicePaste is authorized
- If reinstalled after deletion, re-add it manually

