offbeatengineer .com
PROJECT
№ 01
projects / vocal
Project 01 · active

Vocal.

A single-binary voice toolkit — speech-to-text, text-to-speech, and voice cloning. No cloud, no Python runtime, no Node.js. Runs on your GPU.

cc++ggmlqwen3metalcuda

Vocal is one small CLI that does speech-to-text, text-to-speech, and voice cloning on your own machine. No API keys, no uploads, no Python or Node runtime to babysit. A single binary, Qwen3 models, GGML under the hood, and GPU acceleration on whatever silicon you’ve got.

Why it exists

Every voice tool I tried either (a) wanted an API key and my audio on someone else’s server, or (b) was a research demo that needed three Python environments and a reboot to turn into anything useful. Vocal is the middle — a grown-up version of the demo, without the cloud and without the tooling tax. I built it because I kept writing the same glue code and was tired of pretending that wasn’t the real project.

What’s in the box

How it’s built

On an Apple Silicon laptop, three seconds of audio transcribes in roughly 200 ms end-to-end.

Design principles