Self-Host Whisper: Replace Otter.ai With Local Transcription

Every time you use Otter.ai, Descript, or similar cloud transcription services, your voice recordings travel to someone else’s servers. Otter.ai’s own privacy policy states they train their AI on “de-identified audio recordings” and transcriptions. They also share data with third-party “data labeling service providers” who access your conversations.

It gets worse. Otter.ai is currently facing a federal class action lawsuit alleging it secretly recorded private conversations and used them to train its models - without consent.

The alternative? Run transcription locally. Your audio never leaves your machine. No subscriptions. No data harvesting. Here’s how.

What You’ll Get

By the end of this guide, you’ll have:

faster-whisper running locally on your computer
Transcription that’s 4x faster than OpenAI’s original Whisper
Accuracy matching professional services (2.7% word error rate on clean audio)
Zero ongoing costs after setup

Hardware Requirements

faster-whisper is flexible. Here’s what works:

With GPU (NVIDIA):

GTX 900 series or newer
CUDA 12 and cuDNN 9
4GB+ VRAM for medium models, 8GB+ for large-v3

CPU only:

Any modern x86_64 or ARM processor
8GB+ RAM recommended
Slower, but perfectly usable with smaller models

Model size vs. hardware:

Model	VRAM/RAM	Speed	Best For
tiny	1GB	Fastest	Quick drafts, low-end hardware
base	1GB	Fast	Decent accuracy, older machines
small	2GB	Moderate	Good balance
medium	5GB	Slower	High accuracy
large-v3	10GB	Slowest	Best accuracy (2.7% WER)
distil-large-v3	6GB	Fast	Near-large accuracy, 6x faster

Installation

Step 1: Install Python

Skip this if you already have Python 3.8+ installed. Check with:

python3 --version

Otherwise, grab it from python.org or use your package manager.

Step 2: Create a Virtual Environment (Recommended)

Keeps your system clean:

python3 -m venv whisper-env
source whisper-env/bin/activate  # Linux/Mac
# or: whisper-env\Scripts\activate  # Windows

Step 3: Install faster-whisper

pip install faster-whisper

That’s it. Unlike the original Whisper, faster-whisper bundles FFmpeg via PyAV - no separate installation needed.

For GPU users: You’ll need CUDA 12 and cuDNN 9. NVIDIA’s CUDA Toolkit has installation guides. Alternatively, Purfview’s whisper-standalone-win bundles the required libraries.

Basic Usage

Create a file called transcribe.py:

from faster_whisper import WhisperModel

# Load model (downloads automatically on first run)
model = WhisperModel("base", device="cpu", compute_type="int8")

# For GPU: model = WhisperModel("large-v3", device="cuda", compute_type="float16")

# Transcribe
segments, info = model.transcribe("your_audio.mp3")

print(f"Detected language: {info.language} (probability: {info.language_probability:.2f})")

for segment in segments:
    print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")

Run it:

python transcribe.py

The first run downloads the model (~150MB for base, ~3GB for large-v3). Subsequent runs are instant.

Practical Script: Batch Transcription

Here’s a more useful version that processes multiple files and saves output:

from faster_whisper import WhisperModel
from pathlib import Path
import sys

def transcribe_file(model, audio_path):
    """Transcribe a single file and return text."""
    segments, info = model.transcribe(str(audio_path))

    lines = []
    for segment in segments:
        lines.append(f"[{segment.start:.2f}s] {segment.text.strip()}")

    return "\n".join(lines), info.language

def main():
    if len(sys.argv) < 2:
        print("Usage: python transcribe.py <audio_file_or_directory>")
        sys.exit(1)

    # Adjust model and device based on your hardware
    model = WhisperModel("base", device="cpu", compute_type="int8")

    path = Path(sys.argv[1])

    if path.is_file():
        files = [path]
    else:
        files = list(path.glob("*.mp3")) + list(path.glob("*.wav")) + list(path.glob("*.m4a"))

    for audio_file in files:
        print(f"Transcribing: {audio_file.name}")

        text, language = transcribe_file(model, audio_file)

        output_file = audio_file.with_suffix(".txt")
        output_file.write_text(text)

        print(f"  Language: {language}")
        print(f"  Saved to: {output_file}")

if __name__ == "__main__":
    main()

Usage:

# Single file
python transcribe.py meeting.mp3

# Entire directory
python transcribe.py ./recordings/

GUI Options (No Code Required)

If you’d rather not touch Python, several apps wrap Whisper with friendly interfaces:

Buzz (Windows/Mac/Linux)

Free and open source
Drag-and-drop audio/video files
Export as TXT, SRT, or VTT subtitles
Supports multiple Whisper backends

OpenWhispr (Windows/Mac/Linux)

Free and open source
Automatic text pasting after transcription
Built-in model management

WhisperUI Desktop (Windows/Mac)

Polished interface
$8/month Pro version (free tier available)

Distil-Whisper: The Speed Boost

If large-v3 is too slow but you want near-identical accuracy, try distil-large-v3:

6x faster than large-v3
50% smaller model size
Within 1% word error rate

from faster_whisper import WhisperModel

model = WhisperModel("distil-large-v3", device="cuda", compute_type="float16")

This is the sweet spot for most users with a decent GPU.

Performance Comparison

Real-world transcription speed on a 10-minute audio file:

Setup	Model	Time
RTX 3080	large-v3	~45 seconds
RTX 3080	distil-large-v3	~8 seconds
M2 MacBook Pro	medium	~90 seconds
Ryzen 5600X (CPU)	base	~3 minutes
Ryzen 5600X (CPU)	tiny	~45 seconds

Otter.ai processes the same file in about 60 seconds - but sends your audio to their servers.

Privacy Comparison

Feature	Otter.ai	Local Whisper
Audio leaves your device	Yes	No
Used for AI training	Yes (per ToS)	No
Third-party data sharing	Yes	No
Works offline	No	Yes
Subscription required	Yes ($16.99/mo Pro)	No

Troubleshooting

“CUDA not available”

Your GPU drivers or CUDA installation may be incomplete. Try CPU mode first:

model = WhisperModel("base", device="cpu", compute_type="int8")

Out of memory

Use a smaller model or enable INT8 quantization:

model = WhisperModel("medium", device="cuda", compute_type="int8")

Slow on CPU

Use tiny or base models. Consider the distil variants if you have a GPU.

Poor accuracy on accented speech

Use large-v3 or distil-large-v3. Smaller models struggle with accents and background noise.

What This Means

Cloud transcription services positioned themselves as the only convenient option. That’s no longer true. faster-whisper delivers professional-grade accuracy with a few lines of Python - or a free GUI app.

The class action lawsuit against Otter.ai highlights what’s at stake. Every meeting you transcribe through a cloud service becomes training data. Every private conversation becomes someone else’s intellectual property.

Local transcription isn’t just a privacy choice. It’s free after setup, works offline, and puts you in control.

What You Can Do

Start simple: Install faster-whisper with the base model and try it on a test file
Upgrade later: Once you confirm it works, try larger models or GPU acceleration
Replace subscriptions: Cancel Otter.ai and keep $200/year in your pocket
Share the knowledge: Point colleagues to local alternatives when they complain about subscription costs

Your voice. Your data. Your machine.