feat: add stt-ptt package

2026-01-02 12:24:48 +01:00
parent 44485c4c72
commit de1301e08d
8 changed files with 670 additions and 0 deletions
--- a/docs/README.md
+++ b/docs/README.md
@@ -34,6 +34,7 @@ Documentation for all custom packages:
 - [mem0](./packages/mem0.md) - AI memory assistant with vector storage
 - [msty-studio](./packages/msty-studio.md) - Msty Studio application
 - [pomodoro-timer](./packages/pomodoro-timer.md) - Pomodoro timer utility
+- [stt-ptt](./packages/stt-ptt.md) - Push to Talk Speech to Text using Whisper
 - [tuxedo-backlight](./packages/tuxedo-backlight.md) - Backlight control for Tuxedo laptops
 - [zellij-ps](./packages/zellij-ps.md) - Project switcher for Zellij

@@ -49,6 +50,7 @@ Configuration modules for NixOS and Home Manager:
 #### Home Manager Modules
 - [Overview](./modules/home-manager/overview.md) - Home Manager modules overview
 - [CLI Tools](./modules/home-manager/cli/) - CLI-related modules
+  - [stt-ptt](./modules/home-manager/cli/stt-ptt.md) - Push to Talk Speech to Text
  - [zellij-ps](./modules/home-manager/cli/zellij-ps.md) - Zellij project switcher
 - [Coding](./modules/home-manager/coding/) - Development-related modules
  - [editors](./modules/home-manager/coding/editors.md) - Editor configurations
--- a/docs/modules/home-manager/cli/stt-ptt.md
+++ b/docs/modules/home-manager/cli/stt-ptt.md
@@ -0,0 +1,265 @@
+# stt-ptt Home Manager Module
+
+Push to Talk Speech to Text for Home Manager.
+
+## Overview
+
+This module configures stt-ptt, a push-to-talk speech-to-text tool using whisper.cpp. It handles model downloads, environment configuration, and package installation.
+
+## Quick Start
+
+```nix
+{config, ...}: {
+  imports = [m3ta-nixpkgs.homeManagerModules.default];
+
+  cli.stt-ptt = {
+    enable = true;
+  };
+}
+```
+
+This will:
+- Install stt-ptt with default whisper-cpp
+- Download the `ggml-large-v3-turbo` model on first activation
+- Set environment variables for model path and notification timeout
+
+## Module Options
+
+### `cli.stt-ptt.enable`
+
+Enable the stt-ptt module.
+
+- Type: `boolean`
+- Default: `false`
+
+### `cli.stt-ptt.whisperPackage`
+
+The whisper-cpp package to use for transcription.
+
+- Type: `package`
+- Default: `pkgs.whisper-cpp`
+
+**Pre-built variants:**
+
+```nix
+# CPU (default)
+whisperPackage = pkgs.whisper-cpp;
+
+# Vulkan GPU acceleration (pre-built)
+whisperPackage = pkgs.whisper-cpp-vulkan;
+```
+
+**Override options** (can be combined):
+
+| Option | Description |
+|--------|-------------|
+| `cudaSupport` | NVIDIA CUDA acceleration |
+| `rocmSupport` | AMD ROCm acceleration |
+| `vulkanSupport` | Vulkan GPU acceleration |
+| `coreMLSupport` | Apple CoreML (macOS only) |
+| `metalSupport` | Apple Metal (macOS ARM only) |
+
+```nix
+# NVIDIA CUDA support
+whisperPackage = pkgs.whisper-cpp.override { cudaSupport = true; };
+
+# AMD ROCm support
+whisperPackage = pkgs.whisper-cpp.override { rocmSupport = true; };
+
+# Vulkan support (manual override)
+whisperPackage = pkgs.whisper-cpp.override { vulkanSupport = true; };
+```
+
+### `cli.stt-ptt.model`
+
+The Whisper model to use. Models are automatically downloaded from HuggingFace on first activation.
+
+- Type: `string`
+- Default: `"ggml-large-v3-turbo"`
+
+Available models (sorted by size):
+
+| Model | Size | Notes |
+|-------|------|-------|
+| `ggml-tiny` | 75MB | Fastest, lowest quality |
+| `ggml-tiny.en` | 75MB | English-only, slightly faster |
+| `ggml-base` | 142MB | Fast, basic quality |
+| `ggml-base.en` | 142MB | English-only |
+| `ggml-small` | 466MB | Balanced speed/quality |
+| `ggml-small.en` | 466MB | English-only |
+| `ggml-medium` | 1.5GB | Good quality |
+| `ggml-medium.en` | 1.5GB | English-only |
+| `ggml-large-v1` | 2.9GB | High quality (original) |
+| `ggml-large-v2` | 2.9GB | High quality (improved) |
+| `ggml-large-v3` | 2.9GB | Highest quality |
+| `ggml-large-v3-turbo` | 1.6GB | High quality, optimized speed (recommended) |
+
+Quantized versions (`q5_0`, `q5_1`, `q8_0`) are also available for reduced size.
+
+### `cli.stt-ptt.notifyTimeout`
+
+Notification timeout in milliseconds for the recording indicator.
+
+- Type: `integer`
+- Default: `3000`
+- Example: `5000` (5 seconds), `0` (persistent)
+
+## Usage
+
+After enabling, bind `stt-ptt start` and `stt-ptt stop` to a key:
+
+```bash
+# Start recording
+stt-ptt start
+
+# Stop recording and transcribe (types result)
+stt-ptt stop
+```
+
+### Keybinding Examples
+
+#### Hyprland
+
+```nix
+wayland.windowManager.hyprland.settings = {
+  bind = [
+    "SUPER, V, exec, stt-ptt start"
+  ];
+  bindr = [
+    "SUPER, V, exec, stt-ptt stop"
+  ];
+};
+```
+
+Or in `hyprland.conf`:
+
+```conf
+# Press to start recording, release to transcribe
+bind = SUPER, V, exec, stt-ptt start
+bindr = SUPER, V, exec, stt-ptt stop
+```
+
+#### Sway
+
+```conf
+bindsym --no-repeat $mod+v exec stt-ptt start
+bindsym --release $mod+v exec stt-ptt stop
+```
+
+## Configuration Examples
+
+### Basic Setup
+
+```nix
+cli.stt-ptt = {
+  enable = true;
+};
+```
+
+### Fast English Transcription
+
+```nix
+cli.stt-ptt = {
+  enable = true;
+  model = "ggml-base.en";
+  notifyTimeout = 2000;
+};
+```
+
+### High Quality with NVIDIA GPU
+
+```nix
+cli.stt-ptt = {
+  enable = true;
+  model = "ggml-large-v3";
+  whisperPackage = pkgs.whisper-cpp.override { cudaSupport = true; };
+};
+```
+
+### Vulkan GPU Acceleration
+
+```nix
+cli.stt-ptt = {
+  enable = true;
+  model = "ggml-large-v3-turbo";
+  whisperPackage = pkgs.whisper-cpp-vulkan;
+};
+```
+
+### AMD GPU with ROCm
+
+```nix
+cli.stt-ptt = {
+  enable = true;
+  model = "ggml-large-v3-turbo";
+  whisperPackage = pkgs.whisper-cpp.override { rocmSupport = true; };
+};
+```
+
+### Balanced Setup
+
+```nix
+cli.stt-ptt = {
+  enable = true;
+  model = "ggml-small";
+  notifyTimeout = 3000;
+};
+```
+
+## File Locations
+
+| Path | Description |
+|------|-------------|
+| `~/.local/share/stt-ptt/models/` | Downloaded Whisper models |
+| `~/.cache/stt-ptt/stt.wav` | Temporary audio recording |
+| `~/.cache/stt-ptt/stt.pid` | PID file for recording process |
+
+## Environment Variables
+
+The module sets these automatically:
+
+| Variable | Value |
+|----------|-------|
+| `STT_MODEL` | `~/.local/share/stt-ptt/models/<model>.bin` |
+| `STT_NOTIFY_TIMEOUT` | Configured timeout in ms |
+
+## Requirements
+
+- Wayland compositor (wtype is Wayland-only)
+- PipeWire for audio recording
+- Desktop notification daemon
+
+## Troubleshooting
+
+### Model Download Failed
+
+The model downloads on first `home-manager switch`. If it fails:
+
+```bash
+# Manual download
+mkdir -p ~/.local/share/stt-ptt/models
+curl -L -o ~/.local/share/stt-ptt/models/ggml-large-v3-turbo.bin \
+  https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3-turbo.bin
+```
+
+### Transcription Too Slow
+
+Use a smaller model or enable GPU acceleration:
+
+```nix
+cli.stt-ptt = {
+  enable = true;
+  model = "ggml-tiny.en";  # Much faster
+};
+```
+
+### Text Not Appearing
+
+1. Ensure you're on Wayland: `echo $XDG_SESSION_TYPE`
+2. Check if wtype works: `wtype "test"`
+3. Some apps may need focus; try clicking the text field first
+
+## Related
+
+- [stt-ptt Package](../../../packages/stt-ptt.md) - Package documentation
+- [Using Modules Guide](../../../guides/using-modules.md) - Module usage patterns
--- a/docs/packages/stt-ptt.md
+++ b/docs/packages/stt-ptt.md
@@ -0,0 +1,202 @@
+# stt-ptt
+
+Push to Talk Speech to Text using Whisper.
+
+## Description
+
+stt-ptt is a simple push-to-talk speech-to-text tool that uses whisper.cpp for transcription. It records audio via PipeWire, transcribes it using a local Whisper model, and types the result using wtype (Wayland).
+
+## Features
+
+- **Push to Talk**: Start/stop recording with simple commands
+- **Local Processing**: Uses whisper.cpp for fast, offline transcription
+- **Wayland Native**: Types transcribed text using wtype
+- **Configurable**: Model path and notification timeout via environment variables
+- **Lightweight**: Minimal dependencies, no cloud services
+
+## Installation
+
+### Via Home Manager Module (Recommended)
+
+See [stt-ptt Home Manager Module](../modules/home-manager/cli/stt-ptt.md) for the recommended setup with automatic model download.
+
+### Via Overlay
+
+```nix
+{pkgs, ...}: {
+  home.packages = [pkgs.stt-ptt];
+}
+```
+
+### Direct Reference
+
+```nix
+{pkgs, ...}: {
+  home.packages = [
+    inputs.m3ta-nixpkgs.packages.${pkgs.system}.stt-ptt
+  ];
+}
+```
+
+## Usage
+
+### Basic Commands
+
+```bash
+# Start recording
+stt-ptt start
+
+# Stop recording and transcribe
+stt-ptt stop
+```
+
+### Keybinding Setup
+
+The tool is designed to be bound to a key (e.g., hold to record, release to transcribe).
+
+#### Hyprland
+
+```nix
+# In your Hyprland config
+wayland.windowManager.hyprland.settings = {
+  bind = [
+    # Press Super+V to start, release to stop and transcribe
+    "SUPER, V, exec, stt-ptt start"
+  ];
+  bindr = [
+    # Release trigger
+    "SUPER, V, exec, stt-ptt stop"
+  ];
+};
+```
+
+Or in `hyprland.conf`:
+
+```conf
+bind = SUPER, V, exec, stt-ptt start
+bindr = SUPER, V, exec, stt-ptt stop
+```
+
+#### Sway
+
+```conf
+# Hold to record, release to transcribe
+bindsym --no-repeat $mod+v exec stt-ptt start
+bindsym --release $mod+v exec stt-ptt stop
+```
+
+#### i3 (X11 - requires xdotool instead of wtype)
+
+Note: stt-ptt uses wtype which is Wayland-only. For X11, you would need to modify the script to use xdotool.
+
+### Environment Variables
+
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `STT_MODEL` | Path to Whisper model file | `~/.local/share/stt-ptt/models/ggml-large-v3-turbo.bin` |
+| `STT_NOTIFY_TIMEOUT` | Notification timeout in ms | `3000` |
+
+## Requirements
+
+- **whisper-cpp**: Speech recognition engine
+- **wtype**: Wayland text input (Wayland compositor required)
+- **libnotify**: Desktop notifications
+- **pipewire**: Audio recording
+
+## Model Setup
+
+Download a Whisper model from [HuggingFace](https://huggingface.co/ggerganov/whisper.cpp/tree/main):
+
+```bash
+# Create model directory
+mkdir -p ~/.local/share/stt-ptt/models
+
+# Download model (example: large-v3-turbo)
+curl -L -o ~/.local/share/stt-ptt/models/ggml-large-v3-turbo.bin \
+  https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3-turbo.bin
+```
+
+Or use the Home Manager module which handles this automatically.
+
+## Available Models
+
+| Model | Size | Quality | Speed |
+|-------|------|---------|-------|
+| `ggml-tiny` / `ggml-tiny.en` | 75MB | Basic | Fastest |
+| `ggml-base` / `ggml-base.en` | 142MB | Good | Fast |
+| `ggml-small` / `ggml-small.en` | 466MB | Better | Medium |
+| `ggml-medium` / `ggml-medium.en` | 1.5GB | High | Slower |
+| `ggml-large-v3-turbo` | 1.6GB | High | Fast |
+| `ggml-large-v3` | 2.9GB | Highest | Slowest |
+
+Models ending in `.en` are English-only and slightly faster for English text.
+
+## Platform Support
+
+- Linux with Wayland (primary)
+- Requires PipeWire for audio
+- X11 not supported (wtype is Wayland-only)
+
+## Build Information
+
+- **Version**: 0.1.0
+- **Type**: Shell script wrapper
+- **License**: MIT
+
+## Troubleshooting
+
+### Model Not Found
+
+Error: `Error: Model not found at /path/to/model`
+
+**Solution**: Download a model or use the Home Manager module:
+
+```bash
+curl -L -o ~/.local/share/stt-ptt/models/ggml-large-v3-turbo.bin \
+  https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3-turbo.bin
+```
+
+### No Audio Recorded
+
+**Solution**: Ensure PipeWire is running:
+
+```bash
+systemctl --user status pipewire
+```
+
+### Text Not Typed
+
+**Solution**: Ensure you're on Wayland and wtype has access:
+
+```bash
+# Check if running on Wayland
+echo $XDG_SESSION_TYPE  # Should print "wayland"
+```
+
+### Slow Transcription
+
+**Solution**: Use a smaller model or enable GPU acceleration:
+
+```nix
+cli.stt-ptt = {
+  enable = true;
+  model = "ggml-base.en";  # Smaller, faster model
+};
+```
+
+Or with GPU acceleration:
+
+```nix
+cli.stt-ptt = {
+  enable = true;
+  # Choose one:
+  whisperPackage = pkgs.whisper-cpp-vulkan;  # Vulkan (pre-built)
+  # whisperPackage = pkgs.whisper-cpp.override { cudaSupport = true; };  # NVIDIA
+  # whisperPackage = pkgs.whisper-cpp.override { rocmSupport = true; };  # AMD
+};
+```
+
+## Related
+
+- [stt-ptt Home Manager Module](../modules/home-manager/cli/stt-ptt.md) - Module documentation
+- [Adding Packages](../guides/adding-packages.md) - How to add new packages