# stt-ptt Push to Talk Speech to Text using Whisper. ## Description stt-ptt is a simple push-to-talk speech-to-text tool that uses whisper.cpp for transcription. It records audio via PipeWire, transcribes it using a local Whisper model, and types the result using wtype (Wayland). ## Features - **Push to Talk**: Start/stop recording with simple commands - **Local Processing**: Uses whisper.cpp for fast, offline transcription - **Wayland Native**: Types transcribed text using wtype - **Configurable**: Model path and notification timeout via environment variables - **Lightweight**: Minimal dependencies, no cloud services ## Installation ### Via Home Manager Module (Recommended) See [stt-ptt Home Manager Module](../modules/home-manager/cli/stt-ptt.md) for the recommended setup with automatic model download. ### Via Overlay ```nix {pkgs, ...}: { home.packages = [pkgs.stt-ptt]; } ``` ### Direct Reference ```nix {pkgs, ...}: { home.packages = [ inputs.m3ta-nixpkgs.packages.${pkgs.system}.stt-ptt ]; } ``` ## Usage ### Basic Commands ```bash # Start recording stt-ptt start # Stop recording and transcribe stt-ptt stop ``` ### Keybinding Setup The tool is designed to be bound to a key (e.g., hold to record, release to transcribe). #### Hyprland ```nix # In your Hyprland config wayland.windowManager.hyprland.settings = { bind = [ # Press Super+V to start, release to stop and transcribe "SUPER, V, exec, stt-ptt start" ]; bindr = [ # Release trigger "SUPER, V, exec, stt-ptt stop" ]; }; ``` Or in `hyprland.conf`: ```conf bind = SUPER, V, exec, stt-ptt start bindr = SUPER, V, exec, stt-ptt stop ``` #### Sway ```conf # Hold to record, release to transcribe bindsym --no-repeat $mod+v exec stt-ptt start bindsym --release $mod+v exec stt-ptt stop ``` #### i3 (X11 - requires xdotool instead of wtype) Note: stt-ptt uses wtype which is Wayland-only. For X11, you would need to modify the script to use xdotool. ### Environment Variables | Variable | Description | Default | |----------|-------------|---------| | `STT_MODEL` | Path to Whisper model file | `~/.local/share/stt-ptt/models/ggml-large-v3-turbo.bin` | | `STT_NOTIFY_TIMEOUT` | Notification timeout in ms | `3000` | ## Requirements - **whisper-cpp**: Speech recognition engine - **wtype**: Wayland text input (Wayland compositor required) - **libnotify**: Desktop notifications - **pipewire**: Audio recording ## Model Setup Download a Whisper model from [HuggingFace](https://huggingface.co/ggerganov/whisper.cpp/tree/main): ```bash # Create model directory mkdir -p ~/.local/share/stt-ptt/models # Download model (example: large-v3-turbo) curl -L -o ~/.local/share/stt-ptt/models/ggml-large-v3-turbo.bin \ https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3-turbo.bin ``` Or use the Home Manager module which handles this automatically. ## Available Models | Model | Size | Quality | Speed | |-------|------|---------|-------| | `ggml-tiny` / `ggml-tiny.en` | 75MB | Basic | Fastest | | `ggml-base` / `ggml-base.en` | 142MB | Good | Fast | | `ggml-small` / `ggml-small.en` | 466MB | Better | Medium | | `ggml-medium` / `ggml-medium.en` | 1.5GB | High | Slower | | `ggml-large-v3-turbo` | 1.6GB | High | Fast | | `ggml-large-v3` | 2.9GB | Highest | Slowest | Models ending in `.en` are English-only and slightly faster for English text. ## Platform Support - Linux with Wayland (primary) - Requires PipeWire for audio - X11 not supported (wtype is Wayland-only) ## Build Information - **Version**: 0.1.0 - **Type**: Shell script wrapper - **License**: MIT ## Troubleshooting ### Model Not Found Error: `Error: Model not found at /path/to/model` **Solution**: Download a model or use the Home Manager module: ```bash curl -L -o ~/.local/share/stt-ptt/models/ggml-large-v3-turbo.bin \ https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3-turbo.bin ``` ### No Audio Recorded **Solution**: Ensure PipeWire is running: ```bash systemctl --user status pipewire ``` ### Text Not Typed **Solution**: Ensure you're on Wayland and wtype has access: ```bash # Check if running on Wayland echo $XDG_SESSION_TYPE # Should print "wayland" ``` ### Slow Transcription **Solution**: Use a smaller model or enable GPU acceleration: ```nix cli.stt-ptt = { enable = true; model = "ggml-base.en"; # Smaller, faster model }; ``` Or with GPU acceleration: ```nix cli.stt-ptt = { enable = true; # Choose one: whisperPackage = pkgs.whisper-cpp-vulkan; # Vulkan (pre-built) # whisperPackage = pkgs.whisper-cpp.override { cudaSupport = true; }; # NVIDIA # whisperPackage = pkgs.whisper-cpp.override { rocmSupport = true; }; # AMD }; ``` ## Related - [stt-ptt Home Manager Module](../modules/home-manager/cli/stt-ptt.md) - Module documentation - [Adding Packages](../guides/adding-packages.md) - How to add new packages