feat: add stt-ptt package

This commit is contained in:
m3tm3re
2026-01-02 12:24:48 +01:00
parent 44485c4c72
commit de1301e08d
8 changed files with 670 additions and 0 deletions

View File

@@ -34,6 +34,7 @@ Documentation for all custom packages:
- [mem0](./packages/mem0.md) - AI memory assistant with vector storage
- [msty-studio](./packages/msty-studio.md) - Msty Studio application
- [pomodoro-timer](./packages/pomodoro-timer.md) - Pomodoro timer utility
- [stt-ptt](./packages/stt-ptt.md) - Push to Talk Speech to Text using Whisper
- [tuxedo-backlight](./packages/tuxedo-backlight.md) - Backlight control for Tuxedo laptops
- [zellij-ps](./packages/zellij-ps.md) - Project switcher for Zellij
@@ -49,6 +50,7 @@ Configuration modules for NixOS and Home Manager:
#### Home Manager Modules
- [Overview](./modules/home-manager/overview.md) - Home Manager modules overview
- [CLI Tools](./modules/home-manager/cli/) - CLI-related modules
- [stt-ptt](./modules/home-manager/cli/stt-ptt.md) - Push to Talk Speech to Text
- [zellij-ps](./modules/home-manager/cli/zellij-ps.md) - Zellij project switcher
- [Coding](./modules/home-manager/coding/) - Development-related modules
- [editors](./modules/home-manager/coding/editors.md) - Editor configurations

View File

@@ -0,0 +1,265 @@
# stt-ptt Home Manager Module
Push to Talk Speech to Text for Home Manager.
## Overview
This module configures stt-ptt, a push-to-talk speech-to-text tool using whisper.cpp. It handles model downloads, environment configuration, and package installation.
## Quick Start
```nix
{config, ...}: {
imports = [m3ta-nixpkgs.homeManagerModules.default];
cli.stt-ptt = {
enable = true;
};
}
```
This will:
- Install stt-ptt with default whisper-cpp
- Download the `ggml-large-v3-turbo` model on first activation
- Set environment variables for model path and notification timeout
## Module Options
### `cli.stt-ptt.enable`
Enable the stt-ptt module.
- Type: `boolean`
- Default: `false`
### `cli.stt-ptt.whisperPackage`
The whisper-cpp package to use for transcription.
- Type: `package`
- Default: `pkgs.whisper-cpp`
**Pre-built variants:**
```nix
# CPU (default)
whisperPackage = pkgs.whisper-cpp;
# Vulkan GPU acceleration (pre-built)
whisperPackage = pkgs.whisper-cpp-vulkan;
```
**Override options** (can be combined):
| Option | Description |
|--------|-------------|
| `cudaSupport` | NVIDIA CUDA acceleration |
| `rocmSupport` | AMD ROCm acceleration |
| `vulkanSupport` | Vulkan GPU acceleration |
| `coreMLSupport` | Apple CoreML (macOS only) |
| `metalSupport` | Apple Metal (macOS ARM only) |
```nix
# NVIDIA CUDA support
whisperPackage = pkgs.whisper-cpp.override { cudaSupport = true; };
# AMD ROCm support
whisperPackage = pkgs.whisper-cpp.override { rocmSupport = true; };
# Vulkan support (manual override)
whisperPackage = pkgs.whisper-cpp.override { vulkanSupport = true; };
```
### `cli.stt-ptt.model`
The Whisper model to use. Models are automatically downloaded from HuggingFace on first activation.
- Type: `string`
- Default: `"ggml-large-v3-turbo"`
Available models (sorted by size):
| Model | Size | Notes |
|-------|------|-------|
| `ggml-tiny` | 75MB | Fastest, lowest quality |
| `ggml-tiny.en` | 75MB | English-only, slightly faster |
| `ggml-base` | 142MB | Fast, basic quality |
| `ggml-base.en` | 142MB | English-only |
| `ggml-small` | 466MB | Balanced speed/quality |
| `ggml-small.en` | 466MB | English-only |
| `ggml-medium` | 1.5GB | Good quality |
| `ggml-medium.en` | 1.5GB | English-only |
| `ggml-large-v1` | 2.9GB | High quality (original) |
| `ggml-large-v2` | 2.9GB | High quality (improved) |
| `ggml-large-v3` | 2.9GB | Highest quality |
| `ggml-large-v3-turbo` | 1.6GB | High quality, optimized speed (recommended) |
Quantized versions (`q5_0`, `q5_1`, `q8_0`) are also available for reduced size.
### `cli.stt-ptt.notifyTimeout`
Notification timeout in milliseconds for the recording indicator.
- Type: `integer`
- Default: `3000`
- Example: `5000` (5 seconds), `0` (persistent)
## Usage
After enabling, bind `stt-ptt start` and `stt-ptt stop` to a key:
```bash
# Start recording
stt-ptt start
# Stop recording and transcribe (types result)
stt-ptt stop
```
### Keybinding Examples
#### Hyprland
```nix
wayland.windowManager.hyprland.settings = {
bind = [
"SUPER, V, exec, stt-ptt start"
];
bindr = [
"SUPER, V, exec, stt-ptt stop"
];
};
```
Or in `hyprland.conf`:
```conf
# Press to start recording, release to transcribe
bind = SUPER, V, exec, stt-ptt start
bindr = SUPER, V, exec, stt-ptt stop
```
#### Sway
```conf
bindsym --no-repeat $mod+v exec stt-ptt start
bindsym --release $mod+v exec stt-ptt stop
```
## Configuration Examples
### Basic Setup
```nix
cli.stt-ptt = {
enable = true;
};
```
### Fast English Transcription
```nix
cli.stt-ptt = {
enable = true;
model = "ggml-base.en";
notifyTimeout = 2000;
};
```
### High Quality with NVIDIA GPU
```nix
cli.stt-ptt = {
enable = true;
model = "ggml-large-v3";
whisperPackage = pkgs.whisper-cpp.override { cudaSupport = true; };
};
```
### Vulkan GPU Acceleration
```nix
cli.stt-ptt = {
enable = true;
model = "ggml-large-v3-turbo";
whisperPackage = pkgs.whisper-cpp-vulkan;
};
```
### AMD GPU with ROCm
```nix
cli.stt-ptt = {
enable = true;
model = "ggml-large-v3-turbo";
whisperPackage = pkgs.whisper-cpp.override { rocmSupport = true; };
};
```
### Balanced Setup
```nix
cli.stt-ptt = {
enable = true;
model = "ggml-small";
notifyTimeout = 3000;
};
```
## File Locations
| Path | Description |
|------|-------------|
| `~/.local/share/stt-ptt/models/` | Downloaded Whisper models |
| `~/.cache/stt-ptt/stt.wav` | Temporary audio recording |
| `~/.cache/stt-ptt/stt.pid` | PID file for recording process |
## Environment Variables
The module sets these automatically:
| Variable | Value |
|----------|-------|
| `STT_MODEL` | `~/.local/share/stt-ptt/models/<model>.bin` |
| `STT_NOTIFY_TIMEOUT` | Configured timeout in ms |
## Requirements
- Wayland compositor (wtype is Wayland-only)
- PipeWire for audio recording
- Desktop notification daemon
## Troubleshooting
### Model Download Failed
The model downloads on first `home-manager switch`. If it fails:
```bash
# Manual download
mkdir -p ~/.local/share/stt-ptt/models
curl -L -o ~/.local/share/stt-ptt/models/ggml-large-v3-turbo.bin \
https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3-turbo.bin
```
### Transcription Too Slow
Use a smaller model or enable GPU acceleration:
```nix
cli.stt-ptt = {
enable = true;
model = "ggml-tiny.en"; # Much faster
};
```
### Text Not Appearing
1. Ensure you're on Wayland: `echo $XDG_SESSION_TYPE`
2. Check if wtype works: `wtype "test"`
3. Some apps may need focus; try clicking the text field first
## Related
- [stt-ptt Package](../../../packages/stt-ptt.md) - Package documentation
- [Using Modules Guide](../../../guides/using-modules.md) - Module usage patterns

202
docs/packages/stt-ptt.md Normal file
View File

@@ -0,0 +1,202 @@
# stt-ptt
Push to Talk Speech to Text using Whisper.
## Description
stt-ptt is a simple push-to-talk speech-to-text tool that uses whisper.cpp for transcription. It records audio via PipeWire, transcribes it using a local Whisper model, and types the result using wtype (Wayland).
## Features
- **Push to Talk**: Start/stop recording with simple commands
- **Local Processing**: Uses whisper.cpp for fast, offline transcription
- **Wayland Native**: Types transcribed text using wtype
- **Configurable**: Model path and notification timeout via environment variables
- **Lightweight**: Minimal dependencies, no cloud services
## Installation
### Via Home Manager Module (Recommended)
See [stt-ptt Home Manager Module](../modules/home-manager/cli/stt-ptt.md) for the recommended setup with automatic model download.
### Via Overlay
```nix
{pkgs, ...}: {
home.packages = [pkgs.stt-ptt];
}
```
### Direct Reference
```nix
{pkgs, ...}: {
home.packages = [
inputs.m3ta-nixpkgs.packages.${pkgs.system}.stt-ptt
];
}
```
## Usage
### Basic Commands
```bash
# Start recording
stt-ptt start
# Stop recording and transcribe
stt-ptt stop
```
### Keybinding Setup
The tool is designed to be bound to a key (e.g., hold to record, release to transcribe).
#### Hyprland
```nix
# In your Hyprland config
wayland.windowManager.hyprland.settings = {
bind = [
# Press Super+V to start, release to stop and transcribe
"SUPER, V, exec, stt-ptt start"
];
bindr = [
# Release trigger
"SUPER, V, exec, stt-ptt stop"
];
};
```
Or in `hyprland.conf`:
```conf
bind = SUPER, V, exec, stt-ptt start
bindr = SUPER, V, exec, stt-ptt stop
```
#### Sway
```conf
# Hold to record, release to transcribe
bindsym --no-repeat $mod+v exec stt-ptt start
bindsym --release $mod+v exec stt-ptt stop
```
#### i3 (X11 - requires xdotool instead of wtype)
Note: stt-ptt uses wtype which is Wayland-only. For X11, you would need to modify the script to use xdotool.
### Environment Variables
| Variable | Description | Default |
|----------|-------------|---------|
| `STT_MODEL` | Path to Whisper model file | `~/.local/share/stt-ptt/models/ggml-large-v3-turbo.bin` |
| `STT_NOTIFY_TIMEOUT` | Notification timeout in ms | `3000` |
## Requirements
- **whisper-cpp**: Speech recognition engine
- **wtype**: Wayland text input (Wayland compositor required)
- **libnotify**: Desktop notifications
- **pipewire**: Audio recording
## Model Setup
Download a Whisper model from [HuggingFace](https://huggingface.co/ggerganov/whisper.cpp/tree/main):
```bash
# Create model directory
mkdir -p ~/.local/share/stt-ptt/models
# Download model (example: large-v3-turbo)
curl -L -o ~/.local/share/stt-ptt/models/ggml-large-v3-turbo.bin \
https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3-turbo.bin
```
Or use the Home Manager module which handles this automatically.
## Available Models
| Model | Size | Quality | Speed |
|-------|------|---------|-------|
| `ggml-tiny` / `ggml-tiny.en` | 75MB | Basic | Fastest |
| `ggml-base` / `ggml-base.en` | 142MB | Good | Fast |
| `ggml-small` / `ggml-small.en` | 466MB | Better | Medium |
| `ggml-medium` / `ggml-medium.en` | 1.5GB | High | Slower |
| `ggml-large-v3-turbo` | 1.6GB | High | Fast |
| `ggml-large-v3` | 2.9GB | Highest | Slowest |
Models ending in `.en` are English-only and slightly faster for English text.
## Platform Support
- Linux with Wayland (primary)
- Requires PipeWire for audio
- X11 not supported (wtype is Wayland-only)
## Build Information
- **Version**: 0.1.0
- **Type**: Shell script wrapper
- **License**: MIT
## Troubleshooting
### Model Not Found
Error: `Error: Model not found at /path/to/model`
**Solution**: Download a model or use the Home Manager module:
```bash
curl -L -o ~/.local/share/stt-ptt/models/ggml-large-v3-turbo.bin \
https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3-turbo.bin
```
### No Audio Recorded
**Solution**: Ensure PipeWire is running:
```bash
systemctl --user status pipewire
```
### Text Not Typed
**Solution**: Ensure you're on Wayland and wtype has access:
```bash
# Check if running on Wayland
echo $XDG_SESSION_TYPE # Should print "wayland"
```
### Slow Transcription
**Solution**: Use a smaller model or enable GPU acceleration:
```nix
cli.stt-ptt = {
enable = true;
model = "ggml-base.en"; # Smaller, faster model
};
```
Or with GPU acceleration:
```nix
cli.stt-ptt = {
enable = true;
# Choose one:
whisperPackage = pkgs.whisper-cpp-vulkan; # Vulkan (pre-built)
# whisperPackage = pkgs.whisper-cpp.override { cudaSupport = true; }; # NVIDIA
# whisperPackage = pkgs.whisper-cpp.override { rocmSupport = true; }; # AMD
};
```
## Related
- [stt-ptt Home Manager Module](../modules/home-manager/cli/stt-ptt.md) - Module documentation
- [Adding Packages](../guides/adding-packages.md) - How to add new packages