203 lines
4.8 KiB
Markdown
203 lines
4.8 KiB
Markdown
|
|
# stt-ptt
|
||
|
|
|
||
|
|
Push to Talk Speech to Text using Whisper.
|
||
|
|
|
||
|
|
## Description
|
||
|
|
|
||
|
|
stt-ptt is a simple push-to-talk speech-to-text tool that uses whisper.cpp for transcription. It records audio via PipeWire, transcribes it using a local Whisper model, and types the result using wtype (Wayland).
|
||
|
|
|
||
|
|
## Features
|
||
|
|
|
||
|
|
- **Push to Talk**: Start/stop recording with simple commands
|
||
|
|
- **Local Processing**: Uses whisper.cpp for fast, offline transcription
|
||
|
|
- **Wayland Native**: Types transcribed text using wtype
|
||
|
|
- **Configurable**: Model path and notification timeout via environment variables
|
||
|
|
- **Lightweight**: Minimal dependencies, no cloud services
|
||
|
|
|
||
|
|
## Installation
|
||
|
|
|
||
|
|
### Via Home Manager Module (Recommended)
|
||
|
|
|
||
|
|
See [stt-ptt Home Manager Module](../modules/home-manager/cli/stt-ptt.md) for the recommended setup with automatic model download.
|
||
|
|
|
||
|
|
### Via Overlay
|
||
|
|
|
||
|
|
```nix
|
||
|
|
{pkgs, ...}: {
|
||
|
|
home.packages = [pkgs.stt-ptt];
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### Direct Reference
|
||
|
|
|
||
|
|
```nix
|
||
|
|
{pkgs, ...}: {
|
||
|
|
home.packages = [
|
||
|
|
inputs.m3ta-nixpkgs.packages.${pkgs.system}.stt-ptt
|
||
|
|
];
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
## Usage
|
||
|
|
|
||
|
|
### Basic Commands
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Start recording
|
||
|
|
stt-ptt start
|
||
|
|
|
||
|
|
# Stop recording and transcribe
|
||
|
|
stt-ptt stop
|
||
|
|
```
|
||
|
|
|
||
|
|
### Keybinding Setup
|
||
|
|
|
||
|
|
The tool is designed to be bound to a key (e.g., hold to record, release to transcribe).
|
||
|
|
|
||
|
|
#### Hyprland
|
||
|
|
|
||
|
|
```nix
|
||
|
|
# In your Hyprland config
|
||
|
|
wayland.windowManager.hyprland.settings = {
|
||
|
|
bind = [
|
||
|
|
# Press Super+V to start, release to stop and transcribe
|
||
|
|
"SUPER, V, exec, stt-ptt start"
|
||
|
|
];
|
||
|
|
bindr = [
|
||
|
|
# Release trigger
|
||
|
|
"SUPER, V, exec, stt-ptt stop"
|
||
|
|
];
|
||
|
|
};
|
||
|
|
```
|
||
|
|
|
||
|
|
Or in `hyprland.conf`:
|
||
|
|
|
||
|
|
```conf
|
||
|
|
bind = SUPER, V, exec, stt-ptt start
|
||
|
|
bindr = SUPER, V, exec, stt-ptt stop
|
||
|
|
```
|
||
|
|
|
||
|
|
#### Sway
|
||
|
|
|
||
|
|
```conf
|
||
|
|
# Hold to record, release to transcribe
|
||
|
|
bindsym --no-repeat $mod+v exec stt-ptt start
|
||
|
|
bindsym --release $mod+v exec stt-ptt stop
|
||
|
|
```
|
||
|
|
|
||
|
|
#### i3 (X11 - requires xdotool instead of wtype)
|
||
|
|
|
||
|
|
Note: stt-ptt uses wtype which is Wayland-only. For X11, you would need to modify the script to use xdotool.
|
||
|
|
|
||
|
|
### Environment Variables
|
||
|
|
|
||
|
|
| Variable | Description | Default |
|
||
|
|
|----------|-------------|---------|
|
||
|
|
| `STT_MODEL` | Path to Whisper model file | `~/.local/share/stt-ptt/models/ggml-large-v3-turbo.bin` |
|
||
|
|
| `STT_NOTIFY_TIMEOUT` | Notification timeout in ms | `3000` |
|
||
|
|
|
||
|
|
## Requirements
|
||
|
|
|
||
|
|
- **whisper-cpp**: Speech recognition engine
|
||
|
|
- **wtype**: Wayland text input (Wayland compositor required)
|
||
|
|
- **libnotify**: Desktop notifications
|
||
|
|
- **pipewire**: Audio recording
|
||
|
|
|
||
|
|
## Model Setup
|
||
|
|
|
||
|
|
Download a Whisper model from [HuggingFace](https://huggingface.co/ggerganov/whisper.cpp/tree/main):
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Create model directory
|
||
|
|
mkdir -p ~/.local/share/stt-ptt/models
|
||
|
|
|
||
|
|
# Download model (example: large-v3-turbo)
|
||
|
|
curl -L -o ~/.local/share/stt-ptt/models/ggml-large-v3-turbo.bin \
|
||
|
|
https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3-turbo.bin
|
||
|
|
```
|
||
|
|
|
||
|
|
Or use the Home Manager module which handles this automatically.
|
||
|
|
|
||
|
|
## Available Models
|
||
|
|
|
||
|
|
| Model | Size | Quality | Speed |
|
||
|
|
|-------|------|---------|-------|
|
||
|
|
| `ggml-tiny` / `ggml-tiny.en` | 75MB | Basic | Fastest |
|
||
|
|
| `ggml-base` / `ggml-base.en` | 142MB | Good | Fast |
|
||
|
|
| `ggml-small` / `ggml-small.en` | 466MB | Better | Medium |
|
||
|
|
| `ggml-medium` / `ggml-medium.en` | 1.5GB | High | Slower |
|
||
|
|
| `ggml-large-v3-turbo` | 1.6GB | High | Fast |
|
||
|
|
| `ggml-large-v3` | 2.9GB | Highest | Slowest |
|
||
|
|
|
||
|
|
Models ending in `.en` are English-only and slightly faster for English text.
|
||
|
|
|
||
|
|
## Platform Support
|
||
|
|
|
||
|
|
- Linux with Wayland (primary)
|
||
|
|
- Requires PipeWire for audio
|
||
|
|
- X11 not supported (wtype is Wayland-only)
|
||
|
|
|
||
|
|
## Build Information
|
||
|
|
|
||
|
|
- **Version**: 0.1.0
|
||
|
|
- **Type**: Shell script wrapper
|
||
|
|
- **License**: MIT
|
||
|
|
|
||
|
|
## Troubleshooting
|
||
|
|
|
||
|
|
### Model Not Found
|
||
|
|
|
||
|
|
Error: `Error: Model not found at /path/to/model`
|
||
|
|
|
||
|
|
**Solution**: Download a model or use the Home Manager module:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
curl -L -o ~/.local/share/stt-ptt/models/ggml-large-v3-turbo.bin \
|
||
|
|
https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3-turbo.bin
|
||
|
|
```
|
||
|
|
|
||
|
|
### No Audio Recorded
|
||
|
|
|
||
|
|
**Solution**: Ensure PipeWire is running:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
systemctl --user status pipewire
|
||
|
|
```
|
||
|
|
|
||
|
|
### Text Not Typed
|
||
|
|
|
||
|
|
**Solution**: Ensure you're on Wayland and wtype has access:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Check if running on Wayland
|
||
|
|
echo $XDG_SESSION_TYPE # Should print "wayland"
|
||
|
|
```
|
||
|
|
|
||
|
|
### Slow Transcription
|
||
|
|
|
||
|
|
**Solution**: Use a smaller model or enable GPU acceleration:
|
||
|
|
|
||
|
|
```nix
|
||
|
|
cli.stt-ptt = {
|
||
|
|
enable = true;
|
||
|
|
model = "ggml-base.en"; # Smaller, faster model
|
||
|
|
};
|
||
|
|
```
|
||
|
|
|
||
|
|
Or with GPU acceleration:
|
||
|
|
|
||
|
|
```nix
|
||
|
|
cli.stt-ptt = {
|
||
|
|
enable = true;
|
||
|
|
# Choose one:
|
||
|
|
whisperPackage = pkgs.whisper-cpp-vulkan; # Vulkan (pre-built)
|
||
|
|
# whisperPackage = pkgs.whisper-cpp.override { cudaSupport = true; }; # NVIDIA
|
||
|
|
# whisperPackage = pkgs.whisper-cpp.override { rocmSupport = true; }; # AMD
|
||
|
|
};
|
||
|
|
```
|
||
|
|
|
||
|
|
## Related
|
||
|
|
|
||
|
|
- [stt-ptt Home Manager Module](../modules/home-manager/cli/stt-ptt.md) - Module documentation
|
||
|
|
- [Adding Packages](../guides/adding-packages.md) - How to add new packages
|