Implements speech-to-text (via whisper-cpp) and text-to-speech (via espeak) functionality with key bindings. Replaces coreutils with busybox for lighter dependencies and removes explicit buildInputs since all paths are hardcoded.
266 lines
5.6 KiB
Markdown
266 lines
5.6 KiB
Markdown
# stt-ptt Home Manager Module
|
|
|
|
Push to Talk Speech to Text for Home Manager.
|
|
|
|
## Overview
|
|
|
|
This module configures stt-ptt, a push-to-talk speech-to-text tool using whisper.cpp. It handles model downloads, environment configuration, and package installation.
|
|
|
|
## Quick Start
|
|
|
|
```nix
|
|
{config, ...}: {
|
|
imports = [m3ta-nixpkgs.homeManagerModules.default];
|
|
|
|
cli.stt-ptt = {
|
|
enable = true;
|
|
};
|
|
}
|
|
```
|
|
|
|
This will:
|
|
- Install stt-ptt with default whisper-cpp
|
|
- Download the `ggml-large-v3-turbo` model on first activation
|
|
- Set environment variables for model path and notification timeout
|
|
|
|
## Module Options
|
|
|
|
### `cli.stt-ptt.enable`
|
|
|
|
Enable the stt-ptt module.
|
|
|
|
- Type: `boolean`
|
|
- Default: `false`
|
|
|
|
### `cli.stt-ptt.whisperPackage`
|
|
|
|
The whisper-cpp package to use for transcription.
|
|
|
|
- Type: `package`
|
|
- Default: `pkgs.whisper-cpp`
|
|
|
|
**Pre-built variants:**
|
|
|
|
```nix
|
|
# CPU (default)
|
|
whisperPackage = pkgs.whisper-cpp;
|
|
|
|
# Vulkan GPU acceleration (pre-built)
|
|
whisperPackage = pkgs.whisper-cpp-vulkan;
|
|
```
|
|
|
|
**Override options** (can be combined):
|
|
|
|
| Option | Description |
|
|
|--------|-------------|
|
|
| `cudaSupport` | NVIDIA CUDA acceleration |
|
|
| `rocmSupport` | AMD ROCm acceleration |
|
|
| `vulkanSupport` | Vulkan GPU acceleration |
|
|
| `coreMLSupport` | Apple CoreML (macOS only) |
|
|
| `metalSupport` | Apple Metal (macOS ARM only) |
|
|
|
|
```nix
|
|
# NVIDIA CUDA support
|
|
whisperPackage = pkgs.whisper-cpp.override { cudaSupport = true; };
|
|
|
|
# AMD ROCm support
|
|
whisperPackage = pkgs.whisper-cpp.override { rocmSupport = true; };
|
|
|
|
# Vulkan support (manual override)
|
|
whisperPackage = pkgs.whisper-cpp.override { vulkanSupport = true; };
|
|
```
|
|
|
|
### `cli.stt-ptt.model`
|
|
|
|
The Whisper model to use. Models are automatically downloaded from HuggingFace on first activation.
|
|
|
|
- Type: `string`
|
|
- Default: `"ggml-large-v3-turbo"`
|
|
|
|
Available models (sorted by size):
|
|
|
|
| Model | Size | Notes |
|
|
|-------|------|-------|
|
|
| `ggml-tiny` | 75MB | Fastest, lowest quality |
|
|
| `ggml-tiny.en` | 75MB | English-only, slightly faster |
|
|
| `ggml-base` | 142MB | Fast, basic quality |
|
|
| `ggml-base.en` | 142MB | English-only |
|
|
| `ggml-small` | 466MB | Balanced speed/quality |
|
|
| `ggml-small.en` | 466MB | English-only |
|
|
| `ggml-medium` | 1.5GB | Good quality |
|
|
| `ggml-medium.en` | 1.5GB | English-only |
|
|
| `ggml-large-v1` | 2.9GB | High quality (original) |
|
|
| `ggml-large-v2` | 2.9GB | High quality (improved) |
|
|
| `ggml-large-v3` | 2.9GB | Highest quality |
|
|
| `ggml-large-v3-turbo` | 1.6GB | High quality, optimized speed (recommended) |
|
|
|
|
Quantized versions (`q5_0`, `q5_1`, `q8_0`) are also available for reduced size.
|
|
|
|
### `cli.stt-ptt.notifyTimeout`
|
|
|
|
Notification timeout in milliseconds for the recording indicator.
|
|
|
|
- Type: `integer`
|
|
- Default: `3000`
|
|
- Example: `5000` (5 seconds), `0` (persistent)
|
|
|
|
## Usage
|
|
|
|
After enabling, bind `stt-ptt start` and `stt-ptt stop` to a key:
|
|
|
|
```bash
|
|
# Start recording
|
|
stt-ptt start
|
|
|
|
# Stop recording and transcribe (types result)
|
|
stt-ptt stop
|
|
```
|
|
|
|
### Keybinding Examples
|
|
|
|
#### Hyprland
|
|
|
|
```nix
|
|
wayland.windowManager.hyprland.settings = {
|
|
bind = [
|
|
"SUPER, V, exec, stt-ptt start"
|
|
];
|
|
bindr = [
|
|
"SUPER, V, exec, stt-ptt stop"
|
|
];
|
|
};
|
|
```
|
|
|
|
Or in `hyprland.conf`:
|
|
|
|
```conf
|
|
# Press to start recording, release to transcribe
|
|
bind = SUPER, V, exec, stt-ptt start
|
|
bindr = SUPER, V, exec, stt-ptt stop
|
|
```
|
|
|
|
#### Sway
|
|
|
|
```conf
|
|
bindsym --no-repeat $mod+v exec stt-ptt start
|
|
bindsym --release $mod+v exec stt-ptt stop
|
|
```
|
|
|
|
## Configuration Examples
|
|
|
|
### Basic Setup
|
|
|
|
```nix
|
|
cli.stt-ptt = {
|
|
enable = true;
|
|
};
|
|
```
|
|
|
|
### Fast English Transcription
|
|
|
|
```nix
|
|
cli.stt-ptt = {
|
|
enable = true;
|
|
model = "ggml-base.en";
|
|
notifyTimeout = 2000;
|
|
};
|
|
```
|
|
|
|
### High Quality with NVIDIA GPU
|
|
|
|
```nix
|
|
cli.stt-ptt = {
|
|
enable = true;
|
|
model = "ggml-large-v3";
|
|
whisperPackage = pkgs.whisper-cpp.override { cudaSupport = true; };
|
|
};
|
|
```
|
|
|
|
### Vulkan GPU Acceleration
|
|
|
|
```nix
|
|
cli.stt-ptt = {
|
|
enable = true;
|
|
model = "ggml-large-v3-turbo";
|
|
whisperPackage = pkgs.whisper-cpp-vulkan;
|
|
};
|
|
```
|
|
|
|
### AMD GPU with ROCm
|
|
|
|
```nix
|
|
cli.stt-ptt = {
|
|
enable = true;
|
|
model = "ggml-large-v3-turbo";
|
|
whisperPackage = pkgs.whisper-cpp.override { rocmSupport = true; };
|
|
};
|
|
```
|
|
|
|
### Balanced Setup
|
|
|
|
```nix
|
|
cli.stt-ptt = {
|
|
enable = true;
|
|
model = "ggml-small";
|
|
notifyTimeout = 3000;
|
|
};
|
|
```
|
|
|
|
## File Locations
|
|
|
|
| Path | Description |
|
|
|------|-------------|
|
|
| `~/.local/share/stt-ptt/models/` | Downloaded Whisper models |
|
|
| `~/.cache/stt-ptt/stt.wav` | Temporary audio recording |
|
|
| `~/.cache/stt-ptt/stt.pid` | PID file for recording process |
|
|
|
|
## Environment Variables
|
|
|
|
The module sets these automatically:
|
|
|
|
| Variable | Value |
|
|
|----------|-------|
|
|
| `STT_MODEL` | `~/.local/share/stt-ptt/models/<model>.bin` |
|
|
| `STT_NOTIFY_TIMEOUT` | Configured timeout in ms |
|
|
|
|
## Requirements
|
|
|
|
- Wayland compositor (wtype is Wayland-only)
|
|
- PipeWire for audio recording
|
|
- Desktop notification daemon
|
|
|
|
## Troubleshooting
|
|
|
|
### Model Download Failed
|
|
|
|
The model downloads on first `home-manager switch`. If it fails:
|
|
|
|
```bash
|
|
# Manual download
|
|
mkdir -p ~/.local/share/stt-ptt/models
|
|
curl -L -o ~/.local/share/stt-ptt/models/ggml-large-v3-turbo.bin \
|
|
https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3-turbo.bin
|
|
```
|
|
|
|
### Transcription Too Slow
|
|
|
|
Use a smaller model or enable GPU acceleration:
|
|
|
|
```nix
|
|
cli.stt-ptt = {
|
|
enable = true;
|
|
model = "ggml-tiny.en"; # Much faster
|
|
};
|
|
```
|
|
|
|
### Text Not Appearing
|
|
|
|
1. Ensure you're on Wayland: `echo $XDG_SESSION_TYPE`
|
|
2. Check if wtype works: `wtype "test"`
|
|
3. Some apps may need focus; try clicking the text field first
|
|
|
|
## Related
|
|
|
|
- [stt-ptt Package](../../../packages/stt-ptt.md) - Package documentation
|
|
- [Using Modules Guide](../../../guides/using-modules.md) - Module usage patterns
|