# stt-ptt

Push to Talk Speech to Text using Whisper.

## Description

stt-ptt is a simple push-to-talk speech-to-text tool that uses whisper.cpp for transcription. It records audio via PipeWire, transcribes it using a local Whisper model, and types the result using wtype (Wayland).

## Features

- **Push to Talk**: Start/stop recording with simple commands
- **Local Processing**: Uses whisper.cpp for fast, offline transcription
- **Wayland Native**: Types transcribed text using wtype
- **Configurable**: Model path and notification timeout via environment variables
- **Lightweight**: Minimal dependencies, no cloud services

## Installation

### Via Home Manager Module (Recommended)

See [stt-ptt Home Manager Module](../modules/home-manager/cli/stt-ptt.md) for the recommended setup with automatic model download.

### Via Overlay

```nix
{pkgs, ...}: {
  home.packages = [pkgs.stt-ptt];
}
```

### Direct Reference

```nix
{pkgs, ...}: {
  home.packages = [
    inputs.m3ta-nixpkgs.packages.${pkgs.system}.stt-ptt
  ];
}
```

## Usage

### Basic Commands

```bash
# Start recording
stt-ptt start

# Stop recording and transcribe
stt-ptt stop
```

### Keybinding Setup

The tool is designed to be bound to a key (e.g., hold to record, release to transcribe).

#### Hyprland

```nix
# In your Hyprland config
wayland.windowManager.hyprland.settings = {
  bind = [
    # Press Super+V to start, release to stop and transcribe
    "SUPER, V, exec, stt-ptt start"
  ];
  bindr = [
    # Release trigger
    "SUPER, V, exec, stt-ptt stop"
  ];
};
```

Or in `hyprland.conf`:

```conf
bind = SUPER, V, exec, stt-ptt start
bindr = SUPER, V, exec, stt-ptt stop
```

#### Sway

```conf
# Hold to record, release to transcribe
bindsym --no-repeat $mod+v exec stt-ptt start
bindsym --release $mod+v exec stt-ptt stop
```

#### i3 (X11 - requires xdotool instead of wtype)

Note: stt-ptt uses wtype which is Wayland-only. For X11, you would need to modify the script to use xdotool.

### Environment Variables

| Variable | Description | Default |
|----------|-------------|---------|
| `STT_MODEL` | Path to Whisper model file | `~/.local/share/stt-ptt/models/ggml-large-v3-turbo.bin` |
| `STT_NOTIFY_TIMEOUT` | Notification timeout in ms | `3000` |

## Requirements

- **whisper-cpp**: Speech recognition engine
- **wtype**: Wayland text input (Wayland compositor required)
- **libnotify**: Desktop notifications
- **pipewire**: Audio recording

## Model Setup

Download a Whisper model from [HuggingFace](https://huggingface.co/ggerganov/whisper.cpp/tree/main):

```bash
# Create model directory
mkdir -p ~/.local/share/stt-ptt/models

# Download model (example: large-v3-turbo)
curl -L -o ~/.local/share/stt-ptt/models/ggml-large-v3-turbo.bin \
  https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3-turbo.bin
```

Or use the Home Manager module which handles this automatically.

## Available Models

| Model | Size | Quality | Speed |
|-------|------|---------|-------|
| `ggml-tiny` / `ggml-tiny.en` | 75MB | Basic | Fastest |
| `ggml-base` / `ggml-base.en` | 142MB | Good | Fast |
| `ggml-small` / `ggml-small.en` | 466MB | Better | Medium |
| `ggml-medium` / `ggml-medium.en` | 1.5GB | High | Slower |
| `ggml-large-v3-turbo` | 1.6GB | High | Fast |
| `ggml-large-v3` | 2.9GB | Highest | Slowest |

Models ending in `.en` are English-only and slightly faster for English text.

## Platform Support

- Linux with Wayland (primary)
- Requires PipeWire for audio
- X11 not supported (wtype is Wayland-only)

## Build Information

- **Version**: 0.1.0
- **Type**: Shell script wrapper
- **License**: MIT

## Troubleshooting

### Model Not Found

Error: `Error: Model not found at /path/to/model`

**Solution**: Download a model or use the Home Manager module:

```bash
curl -L -o ~/.local/share/stt-ptt/models/ggml-large-v3-turbo.bin \
  https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3-turbo.bin
```

### No Audio Recorded

**Solution**: Ensure PipeWire is running:

```bash
systemctl --user status pipewire
```

### Text Not Typed

**Solution**: Ensure you're on Wayland and wtype has access:

```bash
# Check if running on Wayland
echo $XDG_SESSION_TYPE  # Should print "wayland"
```

### Slow Transcription

**Solution**: Use a smaller model or enable GPU acceleration:

```nix
cli.stt-ptt = {
  enable = true;
  model = "ggml-base.en";  # Smaller, faster model
};
```

Or with GPU acceleration:

```nix
cli.stt-ptt = {
  enable = true;
  # Choose one:
  whisperPackage = pkgs.whisper-cpp-vulkan;  # Vulkan (pre-built)
  # whisperPackage = pkgs.whisper-cpp.override { cudaSupport = true; };  # NVIDIA
  # whisperPackage = pkgs.whisper-cpp.override { rocmSupport = true; };  # AMD
};
```

## Related

- [stt-ptt Home Manager Module](../modules/home-manager/cli/stt-ptt.md) - Module documentation
- [Adding Packages](../guides/adding-packages.md) - How to add new packages