Every time you use Otter.ai, Descript, or similar cloud transcription services, your voice recordings travel to someone else’s servers. Otter.ai’s own privacy policy states they train their AI on “de-identified audio recordings” and transcriptions. They also share data with third-party “data labeling service providers” who access your conversations.
It gets worse. Otter.ai is currently facing a federal class action lawsuit alleging it secretly recorded private conversations and used them to train its models - without consent.
The alternative? Run transcription locally. Your audio never leaves your machine. No subscriptions. No data harvesting. Here’s how.
What You’ll Get
By the end of this guide, you’ll have:
- faster-whisper running locally on your computer
- Transcription that’s 4x faster than OpenAI’s original Whisper
- Accuracy matching professional services (2.7% word error rate on clean audio)
- Zero ongoing costs after setup
Hardware Requirements
faster-whisper is flexible. Here’s what works:
With GPU (NVIDIA):
- GTX 900 series or newer
- CUDA 12 and cuDNN 9
- 4GB+ VRAM for medium models, 8GB+ for large-v3
CPU only:
- Any modern x86_64 or ARM processor
- 8GB+ RAM recommended
- Slower, but perfectly usable with smaller models
Model size vs. hardware:
| Model | VRAM/RAM | Speed | Best For |
|---|---|---|---|
| tiny | 1GB | Fastest | Quick drafts, low-end hardware |
| base | 1GB | Fast | Decent accuracy, older machines |
| small | 2GB | Moderate | Good balance |
| medium | 5GB | Slower | High accuracy |
| large-v3 | 10GB | Slowest | Best accuracy (2.7% WER) |
| distil-large-v3 | 6GB | Fast | Near-large accuracy, 6x faster |
Installation
Step 1: Install Python
Skip this if you already have Python 3.8+ installed. Check with:
python3 --version
Otherwise, grab it from python.org or use your package manager.
Step 2: Create a Virtual Environment (Recommended)
Keeps your system clean:
python3 -m venv whisper-env
source whisper-env/bin/activate # Linux/Mac
# or: whisper-env\Scripts\activate # Windows
Step 3: Install faster-whisper
pip install faster-whisper
That’s it. Unlike the original Whisper, faster-whisper bundles FFmpeg via PyAV - no separate installation needed.
For GPU users: You’ll need CUDA 12 and cuDNN 9. NVIDIA’s CUDA Toolkit has installation guides. Alternatively, Purfview’s whisper-standalone-win bundles the required libraries.
Basic Usage
Create a file called transcribe.py:
from faster_whisper import WhisperModel
# Load model (downloads automatically on first run)
model = WhisperModel("base", device="cpu", compute_type="int8")
# For GPU: model = WhisperModel("large-v3", device="cuda", compute_type="float16")
# Transcribe
segments, info = model.transcribe("your_audio.mp3")
print(f"Detected language: {info.language} (probability: {info.language_probability:.2f})")
for segment in segments:
print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")
Run it:
python transcribe.py
The first run downloads the model (~150MB for base, ~3GB for large-v3). Subsequent runs are instant.
Practical Script: Batch Transcription
Here’s a more useful version that processes multiple files and saves output:
from faster_whisper import WhisperModel
from pathlib import Path
import sys
def transcribe_file(model, audio_path):
"""Transcribe a single file and return text."""
segments, info = model.transcribe(str(audio_path))
lines = []
for segment in segments:
lines.append(f"[{segment.start:.2f}s] {segment.text.strip()}")
return "\n".join(lines), info.language
def main():
if len(sys.argv) < 2:
print("Usage: python transcribe.py <audio_file_or_directory>")
sys.exit(1)
# Adjust model and device based on your hardware
model = WhisperModel("base", device="cpu", compute_type="int8")
path = Path(sys.argv[1])
if path.is_file():
files = [path]
else:
files = list(path.glob("*.mp3")) + list(path.glob("*.wav")) + list(path.glob("*.m4a"))
for audio_file in files:
print(f"Transcribing: {audio_file.name}")
text, language = transcribe_file(model, audio_file)
output_file = audio_file.with_suffix(".txt")
output_file.write_text(text)
print(f" Language: {language}")
print(f" Saved to: {output_file}")
if __name__ == "__main__":
main()
Usage:
# Single file
python transcribe.py meeting.mp3
# Entire directory
python transcribe.py ./recordings/
GUI Options (No Code Required)
If you’d rather not touch Python, several apps wrap Whisper with friendly interfaces:
Buzz (Windows/Mac/Linux)
- Free and open source
- Drag-and-drop audio/video files
- Export as TXT, SRT, or VTT subtitles
- Supports multiple Whisper backends
OpenWhispr (Windows/Mac/Linux)
- Free and open source
- Automatic text pasting after transcription
- Built-in model management
WhisperUI Desktop (Windows/Mac)
- Polished interface
- $8/month Pro version (free tier available)
Distil-Whisper: The Speed Boost
If large-v3 is too slow but you want near-identical accuracy, try distil-large-v3:
- 6x faster than large-v3
- 50% smaller model size
- Within 1% word error rate
from faster_whisper import WhisperModel
model = WhisperModel("distil-large-v3", device="cuda", compute_type="float16")
This is the sweet spot for most users with a decent GPU.
Performance Comparison
Real-world transcription speed on a 10-minute audio file:
| Setup | Model | Time |
|---|---|---|
| RTX 3080 | large-v3 | ~45 seconds |
| RTX 3080 | distil-large-v3 | ~8 seconds |
| M2 MacBook Pro | medium | ~90 seconds |
| Ryzen 5600X (CPU) | base | ~3 minutes |
| Ryzen 5600X (CPU) | tiny | ~45 seconds |
Otter.ai processes the same file in about 60 seconds - but sends your audio to their servers.
Privacy Comparison
| Feature | Otter.ai | Local Whisper |
|---|---|---|
| Audio leaves your device | Yes | No |
| Used for AI training | Yes (per ToS) | No |
| Third-party data sharing | Yes | No |
| Works offline | No | Yes |
| Subscription required | Yes ($16.99/mo Pro) | No |
Troubleshooting
“CUDA not available”
Your GPU drivers or CUDA installation may be incomplete. Try CPU mode first:
model = WhisperModel("base", device="cpu", compute_type="int8")
Out of memory
Use a smaller model or enable INT8 quantization:
model = WhisperModel("medium", device="cuda", compute_type="int8")
Slow on CPU
Use tiny or base models. Consider the distil variants if you have a GPU.
Poor accuracy on accented speech
Use large-v3 or distil-large-v3. Smaller models struggle with accents and background noise.
What This Means
Cloud transcription services positioned themselves as the only convenient option. That’s no longer true. faster-whisper delivers professional-grade accuracy with a few lines of Python - or a free GUI app.
The class action lawsuit against Otter.ai highlights what’s at stake. Every meeting you transcribe through a cloud service becomes training data. Every private conversation becomes someone else’s intellectual property.
Local transcription isn’t just a privacy choice. It’s free after setup, works offline, and puts you in control.
What You Can Do
- Start simple: Install faster-whisper with the base model and try it on a test file
- Upgrade later: Once you confirm it works, try larger models or GPU acceleration
- Replace subscriptions: Cancel Otter.ai and keep $200/year in your pocket
- Share the knowledge: Point colleagues to local alternatives when they complain about subscription costs
Your voice. Your data. Your machine.