Self-Host Scriberr: Replace Otter.ai With Private AI Transcription

Stop paying $100/year for cloud transcription. Run Scriberr on your own hardware for free, private meeting notes with speaker identification.

Professional studio microphone in a recording booth

Cloud transcription services like Otter.ai charge $100 per year to transcribe your meetings. They also store your audio on their servers, process it through their AI, and retain it according to their privacy policies. For confidential business discussions, legal consultations, or medical appointments, that’s a problem.

Scriberr offers an alternative: fully offline transcription with speaker identification, running entirely on your own hardware. No subscriptions, no cloud uploads, no third-party access to your conversations.

What Scriberr Does

Scriberr is a self-hosted application that transcribes audio and video files locally. It uses WhisperX, which combines OpenAI’s Whisper speech recognition with speaker diarization (identifying who said what) and precise word-level timestamps.

Key features:

  • Offline transcription using NVIDIA Parakeet, Canary, or Whisper models
  • Speaker detection that labels different speakers in the transcript
  • Chat integration with Ollama or OpenAI-compatible APIs for summarizing transcripts
  • Built-in recording to capture audio directly
  • Folder watcher that automatically processes new files
  • PWA support for desktop and mobile access

The transcription happens on your machine. Your audio never leaves your network.

Hardware Requirements

Scriberr runs on both CPU and GPU, but performance differs substantially:

CPU-only:

  • Works on any modern machine
  • Transcription speed: roughly real-time or slower (a 60-minute recording takes about 60 minutes)
  • Adequate for occasional use

GPU-accelerated:

  • Requires NVIDIA GPU with at least 4GB VRAM
  • GTX 1060 or better for basic acceleration
  • RTX 3060/4060 or better recommended for smooth performance
  • Transcription speed: 4-10x faster than real-time

For perspective, faster-whisper requires less than 8GB GPU memory for the large-v2 model with beam_size=5. A mid-range gaming GPU handles transcription comfortably.

Installation: Docker Method

Docker is the easiest way to run Scriberr. You’ll need Docker and Docker Compose installed on your system.

CPU-Only Setup

Create a docker-compose.yml file:

services:
  scriberr:
    image: ghcr.io/rishikanthc/scriberr:latest
    container_name: scriberr
    ports:
      - "8080:8080"
    environment:
      - PUID=1000
      - PGID=1000
      - SECURE_COOKIES=false
    volumes:
      - scriberr_data:/app/data
      - env_data:/app/env
    restart: unless-stopped

volumes:
  scriberr_data:
  env_data:

Then run:

docker compose up -d

GPU-Accelerated Setup (NVIDIA)

First, install the NVIDIA Container Toolkit on your host system. Verify it works with nvidia-smi.

Create docker-compose.yml:

services:
  scriberr:
    image: ghcr.io/rishikanthc/scriberr:v1.0.4-cuda
    container_name: scriberr
    ports:
      - "8080:8080"
    environment:
      - PUID=1000
      - PGID=1000
      - SECURE_COOKIES=false
      - NVIDIA_VISIBLE_DEVICES=all
      - NVIDIA_DRIVER_CAPABILITIES=compute,utility
    volumes:
      - scriberr_data:/app/data
      - env_data:/app/env
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    restart: unless-stopped

volumes:
  scriberr_data:
  env_data:

Run with:

docker compose up -d

For RTX 50-series (Blackwell) GPUs: Use the Blackwell-specific image instead:

image: ghcr.io/rishikanthc/scriberr:v1.0.4-blackwell

First Startup

The first launch takes several minutes. Scriberr downloads machine learning models (Whisper, PyAnnote for diarization, NVIDIA NeMo) and initializes the Python environment. Subsequent starts are fast because models persist in the volumes.

Access the web interface at http://localhost:8080.

Installation: Homebrew Method (macOS/Linux)

If you prefer running without Docker:

brew tap rishikanthc/scriberr
brew install scriberr
scriberr

This installs Scriberr as a native application. It uses your system’s Python environment and runs transcription using Apple Metal (on M-series Macs) or CPU.

Using Scriberr

Manual Transcription

  1. Open the web interface
  2. Upload an audio or video file (MP3, WAV, FLAC, M4A, MP4, etc.)
  3. Wait for transcription to complete
  4. View the transcript with speaker labels and timestamps

Automatic Processing

Set up folder watching to automatically transcribe new recordings:

  1. Configure a watched folder in Scriberr’s settings
  2. Point your recording software to save files there
  3. Transcripts appear automatically

This works well with screen recording tools, voice memo apps, or meeting recorders that save local files.

AI Summarization

Connect Scriberr to a local LLM through Ollama or any OpenAI-compatible API. After transcription, you can chat with the transcript to generate summaries, extract action items, or ask questions about the conversation.

Understanding the Whisper Stack

Scriberr uses WhisperX, which builds on faster-whisper. Understanding this stack helps with troubleshooting and optimization.

faster-whisper reimplements OpenAI’s Whisper model using CTranslate2, a C++ inference engine. It runs 4x faster than the original Python implementation while using less memory through INT8/FP16 quantization.

WhisperX adds three capabilities on top:

  1. Voice Activity Detection (VAD): Identifies speech segments to reduce hallucinations on silent portions
  2. Forced alignment: Uses wav2vec2 to get precise word-level timestamps
  3. Speaker diarization: Uses pyannote-audio to identify different speakers

The tradeoff: WhisperX runs multiple models per audio file, so it uses more memory and processing time than plain faster-whisper. For meeting transcription where knowing who said what matters, that overhead is worth it.

Alternatives Worth Knowing

If Scriberr doesn’t fit your needs, consider these options:

faster-whisper via LinuxServer.io: A simpler container that provides raw transcription through the Wyoming protocol. Good for Home Assistant integration or if you don’t need speaker identification. Uses less memory.

services:
  faster-whisper:
    image: lscr.io/linuxserver/faster-whisper:latest
    container_name: faster-whisper
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=Etc/UTC
      - WHISPER_MODEL=small
    volumes:
      - /path/to/config:/config
    ports:
      - 10300:10300
    restart: unless-stopped

WhisperX directly: If you’re comfortable with Python and want maximum control, run WhisperX as a library or use the whisperx-asr-service container for API access.

Meetily: A more polished meeting transcription tool with its own UI, designed specifically for meetings rather than general audio. Also self-hosted and open source.

Cost Comparison

Running your own transcription eliminates per-minute charges:

ServiceCost
Otter.ai$99-299/year
OpenAI Whisper API$0.006/minute ($3.60/hour)
Rev.ai$0.003-0.005/minute
Self-hosted (Scriberr)$0 (electricity only)

If you transcribe 10 hours of meetings monthly, Otter.ai costs about $100/year. The OpenAI API would cost $36/month ($432/year). Self-hosting costs electricity—a few cents per hour of GPU time.

The real benefit isn’t cost savings. It’s keeping your conversations private.

Why Self-Hosting Matters

Cloud transcription services store your audio. They process it through their systems. They may use it for training. Even with data protection promises, you’re trusting a third party with potentially sensitive information.

Self-hosted transcription stays local:

  • GDPR/HIPAA compliance: No third-party data processor agreements needed
  • Attorney-client privilege: Legal conversations never leave your network
  • Corporate confidentiality: Board meetings, strategy discussions, and HR matters stay internal
  • Personal privacy: Medical appointments, therapy sessions, and personal recordings remain private

Your audio files never touch the internet. The AI models run on your hardware. Transcripts stay on your storage.

Troubleshooting

“Unable to load audio stream” error: Set SECURE_COOKIES=false if accessing via HTTP instead of HTTPS.

Permission errors on Linux: Set PUID and PGID to your user’s UID/GID (find with id command). Default 1000 works for most single-user systems.

GPU not detected: Verify NVIDIA Container Toolkit installation with docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi. If that fails, reinstall the toolkit.

Slow transcription on GPU: Check that Scriberr is actually using the GPU. The CUDA image should show GPU activity in nvidia-smi during transcription.

Models failing to download: First startup requires internet access to download ML models. After initial setup, Scriberr works fully offline.

What You Can Do

  1. Install Scriberr using the Docker method above—it takes about 15 minutes
  2. Test with a short recording to verify everything works
  3. Set up folder watching if you regularly record meetings
  4. Connect to Ollama for AI-powered summaries (optional)

Once running, you’ll never pay for transcription again. More importantly, your conversations stay yours.