Self-Host Whisper: Replace Otter.ai With Local Transcription

Set up faster-whisper on your own machine for private, free transcription. No subscriptions, no cloud uploads, no data training.

Every time you use Otter.ai, Descript, or similar cloud transcription services, your voice recordings travel to someone else’s servers. Otter.ai’s own privacy policy states they train their AI on “de-identified audio recordings” and transcriptions. They also share data with third-party “data labeling service providers” who access your conversations.

It gets worse. Otter.ai is currently facing a federal class action lawsuit alleging it secretly recorded private conversations and used them to train its models - without consent.

The alternative? Run transcription locally. Your audio never leaves your machine. No subscriptions. No data harvesting. Here’s how.

What You’ll Get

By the end of this guide, you’ll have:

  • faster-whisper running locally on your computer
  • Transcription that’s 4x faster than OpenAI’s original Whisper
  • Accuracy matching professional services (2.7% word error rate on clean audio)
  • Zero ongoing costs after setup

Hardware Requirements

faster-whisper is flexible. Here’s what works:

With GPU (NVIDIA):

  • GTX 900 series or newer
  • CUDA 12 and cuDNN 9
  • 4GB+ VRAM for medium models, 8GB+ for large-v3

CPU only:

  • Any modern x86_64 or ARM processor
  • 8GB+ RAM recommended
  • Slower, but perfectly usable with smaller models

Model size vs. hardware:

ModelVRAM/RAMSpeedBest For
tiny1GBFastestQuick drafts, low-end hardware
base1GBFastDecent accuracy, older machines
small2GBModerateGood balance
medium5GBSlowerHigh accuracy
large-v310GBSlowestBest accuracy (2.7% WER)
distil-large-v36GBFastNear-large accuracy, 6x faster

Installation

Step 1: Install Python

Skip this if you already have Python 3.8+ installed. Check with:

python3 --version

Otherwise, grab it from python.org or use your package manager.

Keeps your system clean:

python3 -m venv whisper-env
source whisper-env/bin/activate  # Linux/Mac
# or: whisper-env\Scripts\activate  # Windows

Step 3: Install faster-whisper

pip install faster-whisper

That’s it. Unlike the original Whisper, faster-whisper bundles FFmpeg via PyAV - no separate installation needed.

For GPU users: You’ll need CUDA 12 and cuDNN 9. NVIDIA’s CUDA Toolkit has installation guides. Alternatively, Purfview’s whisper-standalone-win bundles the required libraries.

Basic Usage

Create a file called transcribe.py:

from faster_whisper import WhisperModel

# Load model (downloads automatically on first run)
model = WhisperModel("base", device="cpu", compute_type="int8")

# For GPU: model = WhisperModel("large-v3", device="cuda", compute_type="float16")

# Transcribe
segments, info = model.transcribe("your_audio.mp3")

print(f"Detected language: {info.language} (probability: {info.language_probability:.2f})")

for segment in segments:
    print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")

Run it:

python transcribe.py

The first run downloads the model (~150MB for base, ~3GB for large-v3). Subsequent runs are instant.

Practical Script: Batch Transcription

Here’s a more useful version that processes multiple files and saves output:

from faster_whisper import WhisperModel
from pathlib import Path
import sys

def transcribe_file(model, audio_path):
    """Transcribe a single file and return text."""
    segments, info = model.transcribe(str(audio_path))

    lines = []
    for segment in segments:
        lines.append(f"[{segment.start:.2f}s] {segment.text.strip()}")

    return "\n".join(lines), info.language

def main():
    if len(sys.argv) < 2:
        print("Usage: python transcribe.py <audio_file_or_directory>")
        sys.exit(1)

    # Adjust model and device based on your hardware
    model = WhisperModel("base", device="cpu", compute_type="int8")

    path = Path(sys.argv[1])

    if path.is_file():
        files = [path]
    else:
        files = list(path.glob("*.mp3")) + list(path.glob("*.wav")) + list(path.glob("*.m4a"))

    for audio_file in files:
        print(f"Transcribing: {audio_file.name}")

        text, language = transcribe_file(model, audio_file)

        output_file = audio_file.with_suffix(".txt")
        output_file.write_text(text)

        print(f"  Language: {language}")
        print(f"  Saved to: {output_file}")

if __name__ == "__main__":
    main()

Usage:

# Single file
python transcribe.py meeting.mp3

# Entire directory
python transcribe.py ./recordings/

GUI Options (No Code Required)

If you’d rather not touch Python, several apps wrap Whisper with friendly interfaces:

Buzz (Windows/Mac/Linux)

  • Free and open source
  • Drag-and-drop audio/video files
  • Export as TXT, SRT, or VTT subtitles
  • Supports multiple Whisper backends

OpenWhispr (Windows/Mac/Linux)

  • Free and open source
  • Automatic text pasting after transcription
  • Built-in model management

WhisperUI Desktop (Windows/Mac)

  • Polished interface
  • $8/month Pro version (free tier available)

Distil-Whisper: The Speed Boost

If large-v3 is too slow but you want near-identical accuracy, try distil-large-v3:

  • 6x faster than large-v3
  • 50% smaller model size
  • Within 1% word error rate
from faster_whisper import WhisperModel

model = WhisperModel("distil-large-v3", device="cuda", compute_type="float16")

This is the sweet spot for most users with a decent GPU.

Performance Comparison

Real-world transcription speed on a 10-minute audio file:

SetupModelTime
RTX 3080large-v3~45 seconds
RTX 3080distil-large-v3~8 seconds
M2 MacBook Promedium~90 seconds
Ryzen 5600X (CPU)base~3 minutes
Ryzen 5600X (CPU)tiny~45 seconds

Otter.ai processes the same file in about 60 seconds - but sends your audio to their servers.

Privacy Comparison

FeatureOtter.aiLocal Whisper
Audio leaves your deviceYesNo
Used for AI trainingYes (per ToS)No
Third-party data sharingYesNo
Works offlineNoYes
Subscription requiredYes ($16.99/mo Pro)No

Troubleshooting

“CUDA not available”

Your GPU drivers or CUDA installation may be incomplete. Try CPU mode first:

model = WhisperModel("base", device="cpu", compute_type="int8")

Out of memory

Use a smaller model or enable INT8 quantization:

model = WhisperModel("medium", device="cuda", compute_type="int8")

Slow on CPU

Use tiny or base models. Consider the distil variants if you have a GPU.

Poor accuracy on accented speech

Use large-v3 or distil-large-v3. Smaller models struggle with accents and background noise.

What This Means

Cloud transcription services positioned themselves as the only convenient option. That’s no longer true. faster-whisper delivers professional-grade accuracy with a few lines of Python - or a free GUI app.

The class action lawsuit against Otter.ai highlights what’s at stake. Every meeting you transcribe through a cloud service becomes training data. Every private conversation becomes someone else’s intellectual property.

Local transcription isn’t just a privacy choice. It’s free after setup, works offline, and puts you in control.

What You Can Do

  1. Start simple: Install faster-whisper with the base model and try it on a test file
  2. Upgrade later: Once you confirm it works, try larger models or GPU acceleration
  3. Replace subscriptions: Cancel Otter.ai and keep $200/year in your pocket
  4. Share the knowledge: Point colleagues to local alternatives when they complain about subscription costs

Your voice. Your data. Your machine.