Implements speech-to-text (via whisper-cpp) and text-to-speech (via espeak) functionality with key bindings. Replaces coreutils with busybox for lighter dependencies and removes explicit buildInputs since all paths are hardcoded.
203 lines
4.8 KiB
Markdown
203 lines
4.8 KiB
Markdown
# stt-ptt
|
|
|
|
Push to Talk Speech to Text using Whisper.
|
|
|
|
## Description
|
|
|
|
stt-ptt is a simple push-to-talk speech-to-text tool that uses whisper.cpp for transcription. It records audio via PipeWire, transcribes it using a local Whisper model, and types the result using wtype (Wayland).
|
|
|
|
## Features
|
|
|
|
- **Push to Talk**: Start/stop recording with simple commands
|
|
- **Local Processing**: Uses whisper.cpp for fast, offline transcription
|
|
- **Wayland Native**: Types transcribed text using wtype
|
|
- **Configurable**: Model path and notification timeout via environment variables
|
|
- **Lightweight**: Minimal dependencies, no cloud services
|
|
|
|
## Installation
|
|
|
|
### Via Home Manager Module (Recommended)
|
|
|
|
See [stt-ptt Home Manager Module](../modules/home-manager/cli/stt-ptt.md) for the recommended setup with automatic model download.
|
|
|
|
### Via Overlay
|
|
|
|
```nix
|
|
{pkgs, ...}: {
|
|
home.packages = [pkgs.stt-ptt];
|
|
}
|
|
```
|
|
|
|
### Direct Reference
|
|
|
|
```nix
|
|
{pkgs, ...}: {
|
|
home.packages = [
|
|
inputs.m3ta-nixpkgs.packages.${pkgs.system}.stt-ptt
|
|
];
|
|
}
|
|
```
|
|
|
|
## Usage
|
|
|
|
### Basic Commands
|
|
|
|
```bash
|
|
# Start recording
|
|
stt-ptt start
|
|
|
|
# Stop recording and transcribe
|
|
stt-ptt stop
|
|
```
|
|
|
|
### Keybinding Setup
|
|
|
|
The tool is designed to be bound to a key (e.g., hold to record, release to transcribe).
|
|
|
|
#### Hyprland
|
|
|
|
```nix
|
|
# In your Hyprland config
|
|
wayland.windowManager.hyprland.settings = {
|
|
bind = [
|
|
# Press Super+V to start, release to stop and transcribe
|
|
"SUPER, V, exec, stt-ptt start"
|
|
];
|
|
bindr = [
|
|
# Release trigger
|
|
"SUPER, V, exec, stt-ptt stop"
|
|
];
|
|
};
|
|
```
|
|
|
|
Or in `hyprland.conf`:
|
|
|
|
```conf
|
|
bind = SUPER, V, exec, stt-ptt start
|
|
bindr = SUPER, V, exec, stt-ptt stop
|
|
```
|
|
|
|
#### Sway
|
|
|
|
```conf
|
|
# Hold to record, release to transcribe
|
|
bindsym --no-repeat $mod+v exec stt-ptt start
|
|
bindsym --release $mod+v exec stt-ptt stop
|
|
```
|
|
|
|
#### i3 (X11 - requires xdotool instead of wtype)
|
|
|
|
Note: stt-ptt uses wtype which is Wayland-only. For X11, you would need to modify the script to use xdotool.
|
|
|
|
### Environment Variables
|
|
|
|
| Variable | Description | Default |
|
|
|----------|-------------|---------|
|
|
| `STT_MODEL` | Path to Whisper model file | `~/.local/share/stt-ptt/models/ggml-large-v3-turbo.bin` |
|
|
| `STT_NOTIFY_TIMEOUT` | Notification timeout in ms | `3000` |
|
|
|
|
## Requirements
|
|
|
|
- **whisper-cpp**: Speech recognition engine
|
|
- **wtype**: Wayland text input (Wayland compositor required)
|
|
- **libnotify**: Desktop notifications
|
|
- **pipewire**: Audio recording
|
|
|
|
## Model Setup
|
|
|
|
Download a Whisper model from [HuggingFace](https://huggingface.co/ggerganov/whisper.cpp/tree/main):
|
|
|
|
```bash
|
|
# Create model directory
|
|
mkdir -p ~/.local/share/stt-ptt/models
|
|
|
|
# Download model (example: large-v3-turbo)
|
|
curl -L -o ~/.local/share/stt-ptt/models/ggml-large-v3-turbo.bin \
|
|
https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3-turbo.bin
|
|
```
|
|
|
|
Or use the Home Manager module which handles this automatically.
|
|
|
|
## Available Models
|
|
|
|
| Model | Size | Quality | Speed |
|
|
|-------|------|---------|-------|
|
|
| `ggml-tiny` / `ggml-tiny.en` | 75MB | Basic | Fastest |
|
|
| `ggml-base` / `ggml-base.en` | 142MB | Good | Fast |
|
|
| `ggml-small` / `ggml-small.en` | 466MB | Better | Medium |
|
|
| `ggml-medium` / `ggml-medium.en` | 1.5GB | High | Slower |
|
|
| `ggml-large-v3-turbo` | 1.6GB | High | Fast |
|
|
| `ggml-large-v3` | 2.9GB | Highest | Slowest |
|
|
|
|
Models ending in `.en` are English-only and slightly faster for English text.
|
|
|
|
## Platform Support
|
|
|
|
- Linux with Wayland (primary)
|
|
- Requires PipeWire for audio
|
|
- X11 not supported (wtype is Wayland-only)
|
|
|
|
## Build Information
|
|
|
|
- **Version**: 0.1.0
|
|
- **Type**: Shell script wrapper
|
|
- **License**: MIT
|
|
|
|
## Troubleshooting
|
|
|
|
### Model Not Found
|
|
|
|
Error: `Error: Model not found at /path/to/model`
|
|
|
|
**Solution**: Download a model or use the Home Manager module:
|
|
|
|
```bash
|
|
curl -L -o ~/.local/share/stt-ptt/models/ggml-large-v3-turbo.bin \
|
|
https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3-turbo.bin
|
|
```
|
|
|
|
### No Audio Recorded
|
|
|
|
**Solution**: Ensure PipeWire is running:
|
|
|
|
```bash
|
|
systemctl --user status pipewire
|
|
```
|
|
|
|
### Text Not Typed
|
|
|
|
**Solution**: Ensure you're on Wayland and wtype has access:
|
|
|
|
```bash
|
|
# Check if running on Wayland
|
|
echo $XDG_SESSION_TYPE # Should print "wayland"
|
|
```
|
|
|
|
### Slow Transcription
|
|
|
|
**Solution**: Use a smaller model or enable GPU acceleration:
|
|
|
|
```nix
|
|
cli.stt-ptt = {
|
|
enable = true;
|
|
model = "ggml-base.en"; # Smaller, faster model
|
|
};
|
|
```
|
|
|
|
Or with GPU acceleration:
|
|
|
|
```nix
|
|
cli.stt-ptt = {
|
|
enable = true;
|
|
# Choose one:
|
|
whisperPackage = pkgs.whisper-cpp-vulkan; # Vulkan (pre-built)
|
|
# whisperPackage = pkgs.whisper-cpp.override { cudaSupport = true; }; # NVIDIA
|
|
# whisperPackage = pkgs.whisper-cpp.override { rocmSupport = true; }; # AMD
|
|
};
|
|
```
|
|
|
|
## Related
|
|
|
|
- [stt-ptt Home Manager Module](../modules/home-manager/cli/stt-ptt.md) - Module documentation
|
|
- [Adding Packages](../guides/adding-packages.md) - How to add new packages
|