Files
nixpkgs/docs/modules/home-manager/cli/stt-ptt.md
m3tm3re 00b858fbbe docs: update documentation for latest changes
- Add stt-ptt language support documentation
- Add rofi-project-opener module documentation
- Add rofi-project-opener package documentation
- Update zellij-ps documentation
- Update guides and reference patterns
- Update AGENTS.md with latest commands
2026-01-10 19:12:45 +01:00

6.8 KiB

stt-ptt Home Manager Module

Push to Talk Speech to Text for Home Manager.

Overview

This module configures stt-ptt, a push-to-talk speech-to-text tool using whisper.cpp. It handles model downloads, environment configuration, and package installation.

Quick Start

{config, ...}: {
  imports = [m3ta-nixpkgs.homeManagerModules.default];

  cli.stt-ptt = {
    enable = true;
  };
}

This will:

  • Install stt-ptt with default whisper-cpp
  • Download the ggml-large-v3-turbo model on first activation
  • Set environment variables for model path and notification timeout

Module Options

cli.stt-ptt.enable

Enable the stt-ptt module.

  • Type: boolean
  • Default: false

cli.stt-ptt.whisperPackage

The whisper-cpp package to use for transcription.

  • Type: package
  • Default: pkgs.whisper-cpp

Pre-built variants:

# CPU (default)
whisperPackage = pkgs.whisper-cpp;

# Vulkan GPU acceleration (pre-built)
whisperPackage = pkgs.whisper-cpp-vulkan;

Override options (can be combined):

Option Description
cudaSupport NVIDIA CUDA acceleration
rocmSupport AMD ROCm acceleration
vulkanSupport Vulkan GPU acceleration
coreMLSupport Apple CoreML (macOS only)
metalSupport Apple Metal (macOS ARM only)
# NVIDIA CUDA support
whisperPackage = pkgs.whisper-cpp.override { cudaSupport = true; };

# AMD ROCm support
whisperPackage = pkgs.whisper-cpp.override { rocmSupport = true; };

# Vulkan support (manual override)
whisperPackage = pkgs.whisper-cpp.override { vulkanSupport = true; };

cli.stt-ptt.model

The Whisper model to use. Models are automatically downloaded from HuggingFace on first activation.

  • Type: string
  • Default: "ggml-large-v3-turbo"

Available models (sorted by size):

Model Size Notes
ggml-tiny 75MB Fastest, lowest quality
ggml-tiny.en 75MB English-only, slightly faster
ggml-base 142MB Fast, basic quality
ggml-base.en 142MB English-only
ggml-small 466MB Balanced speed/quality
ggml-small.en 466MB English-only
ggml-medium 1.5GB Good quality
ggml-medium.en 1.5GB English-only
ggml-large-v1 2.9GB High quality (original)
ggml-large-v2 2.9GB High quality (improved)
ggml-large-v3 2.9GB Highest quality
ggml-large-v3-turbo 1.6GB High quality, optimized speed (recommended)

Quantized versions (q5_0, q5_1, q8_0) are also available for reduced size.

cli.stt-ptt.notifyTimeout

Notification timeout in milliseconds for the recording indicator.

  • Type: integer
  • Default: 3000
  • Example: 5000 (5 seconds), 0 (persistent)

cli.stt-ptt.language

Language for speech recognition. Use "auto" for automatic language detection, or specify a language code for better accuracy.

  • Type: enum ["auto", "en", "es", "fr", "de", "it", "pt", "ru", "zh", "ja", "ko", "ar", "hi", "tr", "pl", "nl", "sv", "da", "fi", "no", "vi", "th", "id", "uk", "cs"]
  • Default: "auto"

Auto-detection: When set to "auto", whisper.cpp analyzes the audio to determine the spoken language automatically.

Language specification: Specifying a language code improves transcription accuracy if you know the language in advance.

# Automatic language detection (default)
language = "auto";

# Force English transcription
language = "en";

# Spanish transcription
language = "es";

Common language codes:

Code Language
en English
es Spanish
fr French
de German
zh Chinese
ja Japanese
ko Korean

whisper.cpp supports 100+ languages. See whisper.cpp documentation for the full list.

Usage

After enabling, bind stt-ptt start and stt-ptt stop to a key:

# Start recording
stt-ptt start

# Stop recording and transcribe (types result)
stt-ptt stop

Keybinding Examples

Hyprland

wayland.windowManager.hyprland.settings = {
  bind = [
    "SUPER, V, exec, stt-ptt start"
  ];
  bindr = [
    "SUPER, V, exec, stt-ptt stop"
  ];
};

Or in hyprland.conf:

# Press to start recording, release to transcribe
bind = SUPER, V, exec, stt-ptt start
bindr = SUPER, V, exec, stt-ptt stop

Sway

bindsym --no-repeat $mod+v exec stt-ptt start
bindsym --release $mod+v exec stt-ptt stop

Configuration Examples

Basic Setup

cli.stt-ptt = {
  enable = true;
};

Fast English Transcription

cli.stt-ptt = {
  enable = true;
  model = "ggml-base.en";
  notifyTimeout = 2000;
};

Language-Specific Transcription

cli.stt-ptt = {
  enable = true;
  model = "ggml-large-v3-turbo";
  language = "es";  # Force Spanish transcription
};

High Quality with NVIDIA GPU

cli.stt-ptt = {
  enable = true;
  model = "ggml-large-v3";
  whisperPackage = pkgs.whisper-cpp.override { cudaSupport = true; };
};

Vulkan GPU Acceleration

cli.stt-ptt = {
  enable = true;
  model = "ggml-large-v3-turbo";
  whisperPackage = pkgs.whisper-cpp-vulkan;
};

AMD GPU with ROCm

cli.stt-ptt = {
  enable = true;
  model = "ggml-large-v3-turbo";
  whisperPackage = pkgs.whisper-cpp.override { rocmSupport = true; };
};

Balanced Setup

cli.stt-ptt = {
  enable = true;
  model = "ggml-small";
  notifyTimeout = 3000;
};

File Locations

Path Description
~/.local/share/stt-ptt/models/ Downloaded Whisper models
~/.cache/stt-ptt/stt.wav Temporary audio recording
~/.cache/stt-ptt/stt.pid PID file for recording process

Environment Variables

The module sets these automatically:

Variable Value
STT_MODEL ~/.local/share/stt-ptt/models/<model>.bin
STT_LANGUAGE Configured language ("auto" by default)
STT_NOTIFY_TIMEOUT Configured timeout in ms

Requirements

  • Wayland compositor (wtype is Wayland-only)
  • PipeWire for audio recording
  • Desktop notification daemon

Troubleshooting

Model Download Failed

The model downloads on first home-manager switch. If it fails:

# Manual download
mkdir -p ~/.local/share/stt-ptt/models
curl -L -o ~/.local/share/stt-ptt/models/ggml-large-v3-turbo.bin \
  https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3-turbo.bin

Transcription Too Slow

Use a smaller model or enable GPU acceleration:

cli.stt-ptt = {
  enable = true;
  model = "ggml-tiny.en";  # Much faster
};

Text Not Appearing

  1. Ensure you're on Wayland: echo $XDG_SESSION_TYPE
  2. Check if wtype works: wtype "test"
  3. Some apps may need focus; try clicking the text field first