# stt-ptt Home Manager Module Push to Talk Speech to Text for Home Manager. ## Overview This module configures stt-ptt, a push-to-talk speech-to-text tool using whisper.cpp. It handles model downloads, environment configuration, and package installation. ## Quick Start ```nix {config, ...}: { imports = [m3ta-nixpkgs.homeManagerModules.default]; cli.stt-ptt = { enable = true; }; } ``` This will: - Install stt-ptt with default whisper-cpp - Download the `ggml-large-v3-turbo` model on first activation - Set environment variables for model path and notification timeout ## Module Options ### `cli.stt-ptt.enable` Enable the stt-ptt module. - Type: `boolean` - Default: `false` ### `cli.stt-ptt.whisperPackage` The whisper-cpp package to use for transcription. - Type: `package` - Default: `pkgs.whisper-cpp` **Pre-built variants:** ```nix # CPU (default) whisperPackage = pkgs.whisper-cpp; # Vulkan GPU acceleration (pre-built) whisperPackage = pkgs.whisper-cpp-vulkan; ``` **Override options** (can be combined): | Option | Description | |--------|-------------| | `cudaSupport` | NVIDIA CUDA acceleration | | `rocmSupport` | AMD ROCm acceleration | | `vulkanSupport` | Vulkan GPU acceleration | | `coreMLSupport` | Apple CoreML (macOS only) | | `metalSupport` | Apple Metal (macOS ARM only) | ```nix # NVIDIA CUDA support whisperPackage = pkgs.whisper-cpp.override { cudaSupport = true; }; # AMD ROCm support whisperPackage = pkgs.whisper-cpp.override { rocmSupport = true; }; # Vulkan support (manual override) whisperPackage = pkgs.whisper-cpp.override { vulkanSupport = true; }; ``` ### `cli.stt-ptt.model` The Whisper model to use. Models are automatically downloaded from HuggingFace on first activation. - Type: `string` - Default: `"ggml-large-v3-turbo"` Available models (sorted by size): | Model | Size | Notes | |-------|------|-------| | `ggml-tiny` | 75MB | Fastest, lowest quality | | `ggml-tiny.en` | 75MB | English-only, slightly faster | | `ggml-base` | 142MB | Fast, basic quality | | `ggml-base.en` | 142MB | English-only | | `ggml-small` | 466MB | Balanced speed/quality | | `ggml-small.en` | 466MB | English-only | | `ggml-medium` | 1.5GB | Good quality | | `ggml-medium.en` | 1.5GB | English-only | | `ggml-large-v1` | 2.9GB | High quality (original) | | `ggml-large-v2` | 2.9GB | High quality (improved) | | `ggml-large-v3` | 2.9GB | Highest quality | | `ggml-large-v3-turbo` | 1.6GB | High quality, optimized speed (recommended) | Quantized versions (`q5_0`, `q5_1`, `q8_0`) are also available for reduced size. ### `cli.stt-ptt.notifyTimeout` Notification timeout in milliseconds for the recording indicator. - Type: `integer` - Default: `3000` - Example: `5000` (5 seconds), `0` (persistent) ### `cli.stt-ptt.language` Language for speech recognition. Use "auto" for automatic language detection, or specify a language code for better accuracy. - Type: `enum ["auto", "en", "es", "fr", "de", "it", "pt", "ru", "zh", "ja", "ko", "ar", "hi", "tr", "pl", "nl", "sv", "da", "fi", "no", "vi", "th", "id", "uk", "cs"]` - Default: `"auto"` **Auto-detection**: When set to "auto", whisper.cpp analyzes the audio to determine the spoken language automatically. **Language specification**: Specifying a language code improves transcription accuracy if you know the language in advance. ```nix # Automatic language detection (default) language = "auto"; # Force English transcription language = "en"; # Spanish transcription language = "es"; ``` **Common language codes:** | Code | Language | |------|----------| | `en` | English | | `es` | Spanish | | `fr` | French | | `de` | German | | `zh` | Chinese | | `ja` | Japanese | | `ko` | Korean | whisper.cpp supports 100+ languages. See whisper.cpp documentation for the full list. ## Usage After enabling, bind `stt-ptt start` and `stt-ptt stop` to a key: ```bash # Start recording stt-ptt start # Stop recording and transcribe (types result) stt-ptt stop ``` ### Keybinding Examples #### Hyprland ```nix wayland.windowManager.hyprland.settings = { bind = [ "SUPER, V, exec, stt-ptt start" ]; bindr = [ "SUPER, V, exec, stt-ptt stop" ]; }; ``` Or in `hyprland.conf`: ```conf # Press to start recording, release to transcribe bind = SUPER, V, exec, stt-ptt start bindr = SUPER, V, exec, stt-ptt stop ``` #### Sway ```conf bindsym --no-repeat $mod+v exec stt-ptt start bindsym --release $mod+v exec stt-ptt stop ``` ## Configuration Examples ### Basic Setup ```nix cli.stt-ptt = { enable = true; }; ``` ### Fast English Transcription ```nix cli.stt-ptt = { enable = true; model = "ggml-base.en"; notifyTimeout = 2000; }; ``` ### Language-Specific Transcription ```nix cli.stt-ptt = { enable = true; model = "ggml-large-v3-turbo"; language = "es"; # Force Spanish transcription }; ``` ### High Quality with NVIDIA GPU ```nix cli.stt-ptt = { enable = true; model = "ggml-large-v3"; whisperPackage = pkgs.whisper-cpp.override { cudaSupport = true; }; }; ``` ### Vulkan GPU Acceleration ```nix cli.stt-ptt = { enable = true; model = "ggml-large-v3-turbo"; whisperPackage = pkgs.whisper-cpp-vulkan; }; ``` ### AMD GPU with ROCm ```nix cli.stt-ptt = { enable = true; model = "ggml-large-v3-turbo"; whisperPackage = pkgs.whisper-cpp.override { rocmSupport = true; }; }; ``` ### Balanced Setup ```nix cli.stt-ptt = { enable = true; model = "ggml-small"; notifyTimeout = 3000; }; ``` ## File Locations | Path | Description | |------|-------------| | `~/.local/share/stt-ptt/models/` | Downloaded Whisper models | | `~/.cache/stt-ptt/stt.wav` | Temporary audio recording | | `~/.cache/stt-ptt/stt.pid` | PID file for recording process | ## Environment Variables The module sets these automatically: | Variable | Value | |----------|-------| | `STT_MODEL` | `~/.local/share/stt-ptt/models/.bin` | | `STT_LANGUAGE` | Configured language ("auto" by default) | | `STT_NOTIFY_TIMEOUT` | Configured timeout in ms | ## Requirements - Wayland compositor (wtype is Wayland-only) - PipeWire for audio recording - Desktop notification daemon ## Troubleshooting ### Model Download Failed The model downloads on first `home-manager switch`. If it fails: ```bash # Manual download mkdir -p ~/.local/share/stt-ptt/models curl -L -o ~/.local/share/stt-ptt/models/ggml-large-v3-turbo.bin \ https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3-turbo.bin ``` ### Transcription Too Slow Use a smaller model or enable GPU acceleration: ```nix cli.stt-ptt = { enable = true; model = "ggml-tiny.en"; # Much faster }; ``` ### Text Not Appearing 1. Ensure you're on Wayland: `echo $XDG_SESSION_TYPE` 2. Check if wtype works: `wtype "test"` 3. Some apps may need focus; try clicking the text field first ## Related - [stt-ptt Package](../../../packages/stt-ptt.md) - Package documentation - [Using Modules Guide](../../../guides/using-modules.md) - Module usage patterns