feat: add stt-ptt package

This commit is contained in:
m3tm3re
2026-01-02 12:24:48 +01:00
parent 44485c4c72
commit de1301e08d
8 changed files with 670 additions and 0 deletions

202
docs/packages/stt-ptt.md Normal file
View File

@@ -0,0 +1,202 @@
# stt-ptt
Push to Talk Speech to Text using Whisper.
## Description
stt-ptt is a simple push-to-talk speech-to-text tool that uses whisper.cpp for transcription. It records audio via PipeWire, transcribes it using a local Whisper model, and types the result using wtype (Wayland).
## Features
- **Push to Talk**: Start/stop recording with simple commands
- **Local Processing**: Uses whisper.cpp for fast, offline transcription
- **Wayland Native**: Types transcribed text using wtype
- **Configurable**: Model path and notification timeout via environment variables
- **Lightweight**: Minimal dependencies, no cloud services
## Installation
### Via Home Manager Module (Recommended)
See [stt-ptt Home Manager Module](../modules/home-manager/cli/stt-ptt.md) for the recommended setup with automatic model download.
### Via Overlay
```nix
{pkgs, ...}: {
home.packages = [pkgs.stt-ptt];
}
```
### Direct Reference
```nix
{pkgs, ...}: {
home.packages = [
inputs.m3ta-nixpkgs.packages.${pkgs.system}.stt-ptt
];
}
```
## Usage
### Basic Commands
```bash
# Start recording
stt-ptt start
# Stop recording and transcribe
stt-ptt stop
```
### Keybinding Setup
The tool is designed to be bound to a key (e.g., hold to record, release to transcribe).
#### Hyprland
```nix
# In your Hyprland config
wayland.windowManager.hyprland.settings = {
bind = [
# Press Super+V to start, release to stop and transcribe
"SUPER, V, exec, stt-ptt start"
];
bindr = [
# Release trigger
"SUPER, V, exec, stt-ptt stop"
];
};
```
Or in `hyprland.conf`:
```conf
bind = SUPER, V, exec, stt-ptt start
bindr = SUPER, V, exec, stt-ptt stop
```
#### Sway
```conf
# Hold to record, release to transcribe
bindsym --no-repeat $mod+v exec stt-ptt start
bindsym --release $mod+v exec stt-ptt stop
```
#### i3 (X11 - requires xdotool instead of wtype)
Note: stt-ptt uses wtype which is Wayland-only. For X11, you would need to modify the script to use xdotool.
### Environment Variables
| Variable | Description | Default |
|----------|-------------|---------|
| `STT_MODEL` | Path to Whisper model file | `~/.local/share/stt-ptt/models/ggml-large-v3-turbo.bin` |
| `STT_NOTIFY_TIMEOUT` | Notification timeout in ms | `3000` |
## Requirements
- **whisper-cpp**: Speech recognition engine
- **wtype**: Wayland text input (Wayland compositor required)
- **libnotify**: Desktop notifications
- **pipewire**: Audio recording
## Model Setup
Download a Whisper model from [HuggingFace](https://huggingface.co/ggerganov/whisper.cpp/tree/main):
```bash
# Create model directory
mkdir -p ~/.local/share/stt-ptt/models
# Download model (example: large-v3-turbo)
curl -L -o ~/.local/share/stt-ptt/models/ggml-large-v3-turbo.bin \
https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3-turbo.bin
```
Or use the Home Manager module which handles this automatically.
## Available Models
| Model | Size | Quality | Speed |
|-------|------|---------|-------|
| `ggml-tiny` / `ggml-tiny.en` | 75MB | Basic | Fastest |
| `ggml-base` / `ggml-base.en` | 142MB | Good | Fast |
| `ggml-small` / `ggml-small.en` | 466MB | Better | Medium |
| `ggml-medium` / `ggml-medium.en` | 1.5GB | High | Slower |
| `ggml-large-v3-turbo` | 1.6GB | High | Fast |
| `ggml-large-v3` | 2.9GB | Highest | Slowest |
Models ending in `.en` are English-only and slightly faster for English text.
## Platform Support
- Linux with Wayland (primary)
- Requires PipeWire for audio
- X11 not supported (wtype is Wayland-only)
## Build Information
- **Version**: 0.1.0
- **Type**: Shell script wrapper
- **License**: MIT
## Troubleshooting
### Model Not Found
Error: `Error: Model not found at /path/to/model`
**Solution**: Download a model or use the Home Manager module:
```bash
curl -L -o ~/.local/share/stt-ptt/models/ggml-large-v3-turbo.bin \
https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3-turbo.bin
```
### No Audio Recorded
**Solution**: Ensure PipeWire is running:
```bash
systemctl --user status pipewire
```
### Text Not Typed
**Solution**: Ensure you're on Wayland and wtype has access:
```bash
# Check if running on Wayland
echo $XDG_SESSION_TYPE # Should print "wayland"
```
### Slow Transcription
**Solution**: Use a smaller model or enable GPU acceleration:
```nix
cli.stt-ptt = {
enable = true;
model = "ggml-base.en"; # Smaller, faster model
};
```
Or with GPU acceleration:
```nix
cli.stt-ptt = {
enable = true;
# Choose one:
whisperPackage = pkgs.whisper-cpp-vulkan; # Vulkan (pre-built)
# whisperPackage = pkgs.whisper-cpp.override { cudaSupport = true; }; # NVIDIA
# whisperPackage = pkgs.whisper-cpp.override { rocmSupport = true; }; # AMD
};
```
## Related
- [stt-ptt Home Manager Module](../modules/home-manager/cli/stt-ptt.md) - Module documentation
- [Adding Packages](../guides/adding-packages.md) - How to add new packages