docs/modules/home-manager/cli/stt-ptt.md

# stt-ptt Home Manager Module

Push to Talk Speech to Text for Home Manager.

## Overview

This module configures stt-ptt, a push-to-talk speech-to-text tool using whisper.cpp. It handles model downloads, environment configuration, and package installation.

## Quick Start

```nix
{config, ...}: {
  imports = [m3ta-nixpkgs.homeManagerModules.default];

  cli.stt-ptt = {
    enable = true;
  };
}
```

This will:
- Install stt-ptt with default whisper-cpp
- Download the `ggml-large-v3-turbo` model on first activation
- Set environment variables for model path and notification timeout

## Module Options

### `cli.stt-ptt.enable`

Enable the stt-ptt module.

- Type: `boolean`
- Default: `false`

### `cli.stt-ptt.whisperPackage`

The whisper-cpp package to use for transcription.

- Type: `package`
- Default: `pkgs.whisper-cpp`

**Pre-built variants:**

```nix
# CPU (default)
whisperPackage = pkgs.whisper-cpp;

# Vulkan GPU acceleration (pre-built)
whisperPackage = pkgs.whisper-cpp-vulkan;
```

**Override options** (can be combined):

| Option | Description |
|--------|-------------|
| `cudaSupport` | NVIDIA CUDA acceleration |
| `rocmSupport` | AMD ROCm acceleration |
| `vulkanSupport` | Vulkan GPU acceleration |
| `coreMLSupport` | Apple CoreML (macOS only) |
| `metalSupport` | Apple Metal (macOS ARM only) |

```nix
# NVIDIA CUDA support
whisperPackage = pkgs.whisper-cpp.override { cudaSupport = true; };

# AMD ROCm support
whisperPackage = pkgs.whisper-cpp.override { rocmSupport = true; };

# Vulkan support (manual override)
whisperPackage = pkgs.whisper-cpp.override { vulkanSupport = true; };
```

### `cli.stt-ptt.model`

The Whisper model to use. Models are automatically downloaded from HuggingFace on first activation.

- Type: `string`
- Default: `"ggml-large-v3-turbo"`

Available models (sorted by size):

| Model | Size | Notes |
|-------|------|-------|
| `ggml-tiny` | 75MB | Fastest, lowest quality |
| `ggml-tiny.en` | 75MB | English-only, slightly faster |
| `ggml-base` | 142MB | Fast, basic quality |
| `ggml-base.en` | 142MB | English-only |
| `ggml-small` | 466MB | Balanced speed/quality |
| `ggml-small.en` | 466MB | English-only |
| `ggml-medium` | 1.5GB | Good quality |
| `ggml-medium.en` | 1.5GB | English-only |
| `ggml-large-v1` | 2.9GB | High quality (original) |
| `ggml-large-v2` | 2.9GB | High quality (improved) |
| `ggml-large-v3` | 2.9GB | Highest quality |
| `ggml-large-v3-turbo` | 1.6GB | High quality, optimized speed (recommended) |

Quantized versions (`q5_0`, `q5_1`, `q8_0`) are also available for reduced size.

### `cli.stt-ptt.notifyTimeout`

Notification timeout in milliseconds for the recording indicator.

- Type: `integer`
- Default: `3000`
- Example: `5000` (5 seconds), `0` (persistent)

### `cli.stt-ptt.language`

Language for speech recognition. Use "auto" for automatic language detection, or specify a language code for better accuracy.

- Type: `enum ["auto", "en", "es", "fr", "de", "it", "pt", "ru", "zh", "ja", "ko", "ar", "hi", "tr", "pl", "nl", "sv", "da", "fi", "no", "vi", "th", "id", "uk", "cs"]`
- Default: `"auto"`

**Auto-detection**: When set to "auto", whisper.cpp analyzes the audio to determine the spoken language automatically.

**Language specification**: Specifying a language code improves transcription accuracy if you know the language in advance.

```nix
# Automatic language detection (default)
language = "auto";

# Force English transcription
language = "en";

# Spanish transcription
language = "es";
```

**Common language codes:**

| Code | Language |
|------|----------|
| `en` | English |
| `es` | Spanish |
| `fr` | French |
| `de` | German |
| `zh` | Chinese |
| `ja` | Japanese |
| `ko` | Korean |

whisper.cpp supports 100+ languages. See whisper.cpp documentation for the full list.

## Usage

After enabling, bind `stt-ptt start` and `stt-ptt stop` to a key:

```bash
# Start recording
stt-ptt start

# Stop recording and transcribe (types result)
stt-ptt stop
```

### Keybinding Examples

#### Hyprland

```nix
wayland.windowManager.hyprland.settings = {
  bind = [
    "SUPER, V, exec, stt-ptt start"
  ];
  bindr = [
    "SUPER, V, exec, stt-ptt stop"
  ];
};
```

Or in `hyprland.conf`:

```conf
# Press to start recording, release to transcribe
bind = SUPER, V, exec, stt-ptt start
bindr = SUPER, V, exec, stt-ptt stop
```

#### Sway

```conf
bindsym --no-repeat $mod+v exec stt-ptt start
bindsym --release $mod+v exec stt-ptt stop
```

## Configuration Examples

### Basic Setup

```nix
cli.stt-ptt = {
  enable = true;
};
```

### Fast English Transcription

```nix
cli.stt-ptt = {
  enable = true;
  model = "ggml-base.en";
  notifyTimeout = 2000;
};
```

### Language-Specific Transcription

```nix
cli.stt-ptt = {
  enable = true;
  model = "ggml-large-v3-turbo";
  language = "es";  # Force Spanish transcription
};
```

### High Quality with NVIDIA GPU

```nix
cli.stt-ptt = {
  enable = true;
  model = "ggml-large-v3";
  whisperPackage = pkgs.whisper-cpp.override { cudaSupport = true; };
};
```

### Vulkan GPU Acceleration

```nix
cli.stt-ptt = {
  enable = true;
  model = "ggml-large-v3-turbo";
  whisperPackage = pkgs.whisper-cpp-vulkan;
};
```

### AMD GPU with ROCm

```nix
cli.stt-ptt = {
  enable = true;
  model = "ggml-large-v3-turbo";
  whisperPackage = pkgs.whisper-cpp.override { rocmSupport = true; };
};
```

### Balanced Setup

```nix
cli.stt-ptt = {
  enable = true;
  model = "ggml-small";
  notifyTimeout = 3000;
};
```

## File Locations

| Path | Description |
|------|-------------|
| `~/.local/share/stt-ptt/models/` | Downloaded Whisper models |
| `~/.cache/stt-ptt/stt.wav` | Temporary audio recording |
| `~/.cache/stt-ptt/stt.pid` | PID file for recording process |

## Environment Variables

The module sets these automatically:

| Variable | Value |
|----------|-------|
| `STT_MODEL` | `~/.local/share/stt-ptt/models/<model>.bin` |
| `STT_LANGUAGE` | Configured language ("auto" by default) |
| `STT_NOTIFY_TIMEOUT` | Configured timeout in ms |

## Requirements

- Wayland compositor (wtype is Wayland-only)
- PipeWire for audio recording
- Desktop notification daemon

## Troubleshooting

### Model Download Failed

The model downloads on first `home-manager switch`. If it fails:

```bash
# Manual download
mkdir -p ~/.local/share/stt-ptt/models
curl -L -o ~/.local/share/stt-ptt/models/ggml-large-v3-turbo.bin \
  https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3-turbo.bin
```

### Transcription Too Slow

Use a smaller model or enable GPU acceleration:

```nix
cli.stt-ptt = {
  enable = true;
  model = "ggml-tiny.en";  # Much faster
};
```

### Text Not Appearing

1. Ensure you're on Wayland: `echo $XDG_SESSION_TYPE`
2. Check if wtype works: `wtype "test"`
3. Some apps may need focus; try clicking the text field first

## Related

- [stt-ptt Package](../../../packages/stt-ptt.md) - Package documentation
- [Using Modules Guide](../../../guides/using-modules.md) - Module usage patterns
feat: add stt-ptt package 2026-01-02 12:24:48 +01:00			`# stt-ptt Home Manager Module`

			`Push to Talk Speech to Text for Home Manager.`

			`## Overview`

			`This module configures stt-ptt, a push-to-talk speech-to-text tool using whisper.cpp. It handles model downloads, environment configuration, and package installation.`

			`## Quick Start`

			```nix
			`{config, ...}: {`
			`imports = [m3ta-nixpkgs.homeManagerModules.default];`

			`cli.stt-ptt = {`
			`enable = true;`
			`};`
			`}`
			```

			`This will:`
			`- Install stt-ptt with default whisper-cpp`
			- Download the `ggml-large-v3-turbo` model on first activation
			`- Set environment variables for model path and notification timeout`

			`## Module Options`

			### `cli.stt-ptt.enable`

			`Enable the stt-ptt module.`

			- Type: `boolean`
			- Default: `false`

			### `cli.stt-ptt.whisperPackage`

			`The whisper-cpp package to use for transcription.`

			- Type: `package`
			- Default: `pkgs.whisper-cpp`

			`Pre-built variants:`

			```nix
			`# CPU (default)`
			`whisperPackage = pkgs.whisper-cpp;`

			`# Vulkan GPU acceleration (pre-built)`
			`whisperPackage = pkgs.whisper-cpp-vulkan;`
			```

			`Override options (can be combined):`

			`\| Option \| Description \|`
			`\|--------\|-------------\|`
			\| `cudaSupport` \| NVIDIA CUDA acceleration \|
			\| `rocmSupport` \| AMD ROCm acceleration \|
			\| `vulkanSupport` \| Vulkan GPU acceleration \|
			\| `coreMLSupport` \| Apple CoreML (macOS only) \|
			\| `metalSupport` \| Apple Metal (macOS ARM only) \|

			```nix
			`# NVIDIA CUDA support`
			`whisperPackage = pkgs.whisper-cpp.override { cudaSupport = true; };`

			`# AMD ROCm support`
			`whisperPackage = pkgs.whisper-cpp.override { rocmSupport = true; };`

			`# Vulkan support (manual override)`
			`whisperPackage = pkgs.whisper-cpp.override { vulkanSupport = true; };`
			```

			### `cli.stt-ptt.model`

			`The Whisper model to use. Models are automatically downloaded from HuggingFace on first activation.`

			- Type: `string`
			- Default: `"ggml-large-v3-turbo"`

			`Available models (sorted by size):`

			`\| Model \| Size \| Notes \|`
			`\|-------\|------\|-------\|`
			\| `ggml-tiny` \| 75MB \| Fastest, lowest quality \|
			\| `ggml-tiny.en` \| 75MB \| English-only, slightly faster \|
			\| `ggml-base` \| 142MB \| Fast, basic quality \|
			\| `ggml-base.en` \| 142MB \| English-only \|
			\| `ggml-small` \| 466MB \| Balanced speed/quality \|
			\| `ggml-small.en` \| 466MB \| English-only \|
			\| `ggml-medium` \| 1.5GB \| Good quality \|
			\| `ggml-medium.en` \| 1.5GB \| English-only \|
			\| `ggml-large-v1` \| 2.9GB \| High quality (original) \|
			\| `ggml-large-v2` \| 2.9GB \| High quality (improved) \|
			\| `ggml-large-v3` \| 2.9GB \| Highest quality \|
			\| `ggml-large-v3-turbo` \| 1.6GB \| High quality, optimized speed (recommended) \|

			Quantized versions (`q5_0`, `q5_1`, `q8_0`) are also available for reduced size.

			### `cli.stt-ptt.notifyTimeout`

			`Notification timeout in milliseconds for the recording indicator.`

			- Type: `integer`
			- Default: `3000`
			- Example: `5000` (5 seconds), `0` (persistent)

docs: update documentation for latest changes 2026-01-10 19:12:45 +01:00			### `cli.stt-ptt.language`

			`Language for speech recognition. Use "auto" for automatic language detection, or specify a language code for better accuracy.`

			- Type: `enum ["auto", "en", "es", "fr", "de", "it", "pt", "ru", "zh", "ja", "ko", "ar", "hi", "tr", "pl", "nl", "sv", "da", "fi", "no", "vi", "th", "id", "uk", "cs"]`
			- Default: `"auto"`

			`Auto-detection: When set to "auto", whisper.cpp analyzes the audio to determine the spoken language automatically.`

			`Language specification: Specifying a language code improves transcription accuracy if you know the language in advance.`

			```nix
			`# Automatic language detection (default)`
			`language = "auto";`

			`# Force English transcription`
			`language = "en";`

			`# Spanish transcription`
			`language = "es";`
			```

			`Common language codes:`

			`\| Code \| Language \|`
			`\|------\|----------\|`
			\| `en` \| English \|
			\| `es` \| Spanish \|
			\| `fr` \| French \|
			\| `de` \| German \|
			\| `zh` \| Chinese \|
			\| `ja` \| Japanese \|
			\| `ko` \| Korean \|

			`whisper.cpp supports 100+ languages. See whisper.cpp documentation for the full list.`

feat: add stt-ptt package 2026-01-02 12:24:48 +01:00			`## Usage`

			After enabling, bind `stt-ptt start` and `stt-ptt stop` to a key:

			```bash
			`# Start recording`
			`stt-ptt start`

			`# Stop recording and transcribe (types result)`
			`stt-ptt stop`
			```

			`### Keybinding Examples`

			`#### Hyprland`

			```nix
			`wayland.windowManager.hyprland.settings = {`
			`bind = [`
			`"SUPER, V, exec, stt-ptt start"`
			`];`
			`bindr = [`
			`"SUPER, V, exec, stt-ptt stop"`
			`];`
			`};`
			```

			Or in `hyprland.conf`:

			```conf
			`# Press to start recording, release to transcribe`
			`bind = SUPER, V, exec, stt-ptt start`
			`bindr = SUPER, V, exec, stt-ptt stop`
			```

			`#### Sway`

			```conf
			`bindsym --no-repeat $mod+v exec stt-ptt start`
			`bindsym --release $mod+v exec stt-ptt stop`
			```

			`## Configuration Examples`

			`### Basic Setup`

			```nix
			`cli.stt-ptt = {`
			`enable = true;`
			`};`
			```

			`### Fast English Transcription`

			```nix
			`cli.stt-ptt = {`
			`enable = true;`
			`model = "ggml-base.en";`
			`notifyTimeout = 2000;`
			`};`
			```

docs: update documentation for latest changes 2026-01-10 19:12:45 +01:00			`### Language-Specific Transcription`

			```nix
			`cli.stt-ptt = {`
			`enable = true;`
			`model = "ggml-large-v3-turbo";`
			`language = "es"; # Force Spanish transcription`
			`};`
			```

feat: add stt-ptt package 2026-01-02 12:24:48 +01:00			`### High Quality with NVIDIA GPU`

			```nix
			`cli.stt-ptt = {`
			`enable = true;`
			`model = "ggml-large-v3";`
			`whisperPackage = pkgs.whisper-cpp.override { cudaSupport = true; };`
			`};`
			```

			`### Vulkan GPU Acceleration`

			```nix
			`cli.stt-ptt = {`
			`enable = true;`
			`model = "ggml-large-v3-turbo";`
			`whisperPackage = pkgs.whisper-cpp-vulkan;`
			`};`
			```

			`### AMD GPU with ROCm`

			```nix
			`cli.stt-ptt = {`
			`enable = true;`
			`model = "ggml-large-v3-turbo";`
			`whisperPackage = pkgs.whisper-cpp.override { rocmSupport = true; };`
			`};`
			```

			`### Balanced Setup`

			```nix
			`cli.stt-ptt = {`
			`enable = true;`
			`model = "ggml-small";`
			`notifyTimeout = 3000;`
			`};`
			```

			`## File Locations`

			`\| Path \| Description \|`
			`\|------\|-------------\|`
			\| `~/.local/share/stt-ptt/models/` \| Downloaded Whisper models \|
			\| `~/.cache/stt-ptt/stt.wav` \| Temporary audio recording \|
			\| `~/.cache/stt-ptt/stt.pid` \| PID file for recording process \|

			`## Environment Variables`

			`The module sets these automatically:`

			`\| Variable \| Value \|`
			`\|----------\|-------\|`
			\| `STT_MODEL` \| `~/.local/share/stt-ptt/models/<model>.bin` \|
docs: update documentation for latest changes 2026-01-10 19:12:45 +01:00			\| `STT_LANGUAGE` \| Configured language ("auto" by default) \|
feat: add stt-ptt package 2026-01-02 12:24:48 +01:00			\| `STT_NOTIFY_TIMEOUT` \| Configured timeout in ms \|

			`## Requirements`

			`- Wayland compositor (wtype is Wayland-only)`
			`- PipeWire for audio recording`
			`- Desktop notification daemon`

			`## Troubleshooting`

			`### Model Download Failed`

			The model downloads on first `home-manager switch`. If it fails:

			```bash
			`# Manual download`
			`mkdir -p ~/.local/share/stt-ptt/models`
			`curl -L -o ~/.local/share/stt-ptt/models/ggml-large-v3-turbo.bin \`
			`https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3-turbo.bin`
			```

			`### Transcription Too Slow`

			`Use a smaller model or enable GPU acceleration:`

			```nix
			`cli.stt-ptt = {`
			`enable = true;`
			`model = "ggml-tiny.en"; # Much faster`
			`};`
			```

			`### Text Not Appearing`

			1. Ensure you're on Wayland: `echo $XDG_SESSION_TYPE`
			2. Check if wtype works: `wtype "test"`
			`3. Some apps may need focus; try clicking the text field first`

			`## Related`

			`- [stt-ptt Package](../../../packages/stt-ptt.md) - Package documentation`
			`- [Using Modules Guide](../../../guides/using-modules.md) - Module usage patterns`