No description
| cmd/voicepaste | ||
| internal | ||
| .env.example | ||
| .gitignore | ||
| go.mod | ||
| go.sum | ||
| Makefile | ||
| README.md | ||
| voicepaste.conf | ||
| voicepaste.exe | ||
voicepaste
Global push-to-talk voice transcription tool. Press a hotkey to start recording, press again to stop. Audio is transcribed and pasted into the currently focused element.
How it works
- Ctrl+Shift+Space → start recording from microphone
- Ctrl+Shift+Space again → stop recording
- Audio is saved to a temp WAV file and sent for transcription
- Transcribed text is pasted into whatever is focused (clipboard + Ctrl+V)
Prerequisites
Windows
- PortAudio: download from http://www.portaudio.com or install via vcpkg/msys2
- No additional tools needed (uses Win32 APIs for hotkey and paste)
Linux (Ubuntu/Debian)
sudo apt install libportaudio2 portaudio19-dev libx11-dev xclip xdotool
Building (both platforms need Go 1.22+)
# Native build (auto-detects platform)
make build
# Or explicitly:
make build-linux
make build-windows # needs mingw: sudo apt install gcc-mingw-w64-x86-64
Usage
./voicepaste
Then press Ctrl+Shift+Space to toggle recording. Logs go to stderr.
WSL2
Run the Windows binary. It captures the global hotkey, records via Windows audio, and pastes into whatever window is focused — including your WSL2 terminal.
Architecture
cmd/voicepaste/main.go — entry point, wires everything together
internal/
hotkey/
hotkey.go — shared Handle type
hotkey_windows.go — Win32 RegisterHotKey + GetMessage loop
hotkey_linux.go — X11 XGrabKey + XNextEvent loop
audio/
audio.go — PortAudio recording + WAV encoding
paste/
paste_windows.go — Win32 clipboard + SendInput(Ctrl+V)
paste_linux.go — xclip + xdotool
transcribe/
transcribe.go — stub (returns "ABCXYZ"), replace with API call
TODO
- Replace transcribe stub with OpenAI Whisper API call
- Config file for hotkey, API key, etc.
- Audio level indicator / tray icon
- Wayland support (wl-copy + wtype/ydotool)
- Optional: hold-to-talk mode