Self-Host Tabby: Run Your Own AI Coding Assistant and Ditch GitHub Copilot

Step-by-step guide to deploying Tabby, the open-source AI coding assistant that keeps your code private and costs nothing after setup.

Lines of code displayed on a computer monitor in a dark room

GitHub Copilot costs $10-39 per month for individuals and $19-39 per user per month for businesses. That’s $120-468 annually just to have an AI suggest code while it sends every keystroke to Microsoft’s servers. There’s another option: run the same capability yourself, keep your code private, and pay nothing beyond electricity.

Tabby is an open-source, self-hosted AI coding assistant with 33,000+ GitHub stars. It offers real-time code completion, chat assistance, and repository-level context—all running entirely on your own hardware.

What You Get

Tabby provides autocomplete suggestions as you type, similar to Copilot. The difference: your code never leaves your machine. No cloud API calls, no data collection, no subscription fees.

Key features:

  • Real-time code completion using Fill-in-the-Middle prompting
  • Chat interface for asking questions about your codebase
  • Repository indexing for context-aware suggestions
  • 40+ language support including Python, JavaScript, TypeScript, Rust, Go, and Java
  • IDE plugins for VS Code, JetBrains IDEs, and Vim/Neovim

The latest version (v0.32.0, released January 2026) added GitLab merge request context indexing and REST API documentation enhancement capabilities.

Hardware Requirements

You need a GPU with at least 8GB VRAM for reasonable performance. Here’s what Tabby recommends:

Model SizeMinimum GPURecommended
1B parametersAny modern GPURTX 2060 or better
7B parameters8GB VRAMRTX 3080, RTX 4070
13B parameters16GB+ VRAMRTX 4090, A100

For Mac users: Tabby runs on Apple Silicon. The StarCoder-1B model achieves approximately 90 tokens per second on an M2 Max—fast enough for real-time suggestions.

No GPU? Tabby supports CPU inference, though expect slower response times. A smaller model like StarCoder-1B remains usable on CPU-only systems.

Installation

Prerequisites

  1. Docker installed and running
  2. NVIDIA Container Toolkit (for GPU acceleration)

Install the NVIDIA Container Toolkit on Ubuntu/Debian:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

Docker Compose Setup

Create a docker-compose.yml file:

version: '3.5'
services:
  tabby:
    restart: always
    image: tabbyml/tabby
    command: serve --model StarCoder-1B --chat-model Qwen2-1.5B-Instruct --device cuda
    volumes:
      - "./tabby-data:/data"
    ports:
      - "8080:8080"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

Start Tabby:

docker compose up -d

The first launch downloads the models (roughly 500MB for StarCoder-1B). Wait a few minutes, then open http://localhost:8080 to create your admin account.

CPU-Only Setup

No NVIDIA GPU? Remove the GPU reservation and change the device flag:

version: '3.5'
services:
  tabby:
    restart: always
    image: tabbyml/tabby
    command: serve --model StarCoder-1B --chat-model Qwen2-1.5B-Instruct --device cpu
    volumes:
      - "./tabby-data:/data"
    ports:
      - "8080:8080"

Mac Installation

On Apple Silicon, use Homebrew:

brew install tabbyml/tabby/tabby
tabby serve --model StarCoder-1B --chat-model Qwen2-1.5B-Instruct --device metal

Connect Your IDE

VS Code

  1. Open Extensions (Cmd/Ctrl + Shift + X)
  2. Search “Tabby”
  3. Install the official Tabby extension
  4. Open settings and set the server URL to http://localhost:8080
  5. Authenticate if you set up authentication

The extension shows a connection status indicator. Once connected, you’ll see inline suggestions as you type.

JetBrains IDEs

  1. Open Settings > Plugins
  2. Search “Tabby” in Marketplace
  3. Install and restart
  4. Configure the server URL under Settings > Tools > Tabby

Vim/Neovim

For Neovim with lazy.nvim:

{
  "TabbyML/vim-tabby",
  lazy = false,
  dependencies = { "neovim/nvim-lspconfig" },
}

Configure in your init file:

vim.g.tabby_server_url = 'http://localhost:8080'

Choosing Models

Tabby separates completion and chat models. You can mix and match based on your hardware.

Completion Models

  • StarCoder-1B: Fast, lightweight, good for quick suggestions
  • StarCoder-3B: Better quality, still runs on 8GB VRAM
  • CodeLlama-7B: High quality, needs 8GB+ VRAM
  • CodeQwen-7B: Strong performance, especially for Python

Chat Models

  • Qwen2-1.5B-Instruct: Lightweight, good for basic questions
  • CodeLlama-7B-Instruct: Better reasoning capability
  • Mistral-7B-Instruct: Strong general-purpose chat

Check the models registry for the full list. You can also configure external providers like Ollama for more model options.

Index Your Codebase

Tabby’s context-aware suggestions improve dramatically when you index your repositories.

  1. Open the Tabby web interface at http://localhost:8080
  2. Navigate to Code Browser > Repositories
  3. Add your repository URL or local path
  4. Wait for indexing to complete

Once indexed, Tabby considers your existing code patterns when suggesting completions.

Performance Tuning

Reduce Latency

Tabby includes an adaptive caching strategy. For additional speed:

  • Use a smaller model (1B-3B parameters)
  • Keep Tabby running continuously (cold starts are slow)
  • If using Docker, ensure the container has sufficient memory allocated

Multiple Users

For teams, run Tabby on a dedicated server. Each user connects their IDE to the central server URL. One GPU handles multiple concurrent users for typical coding sessions.

Want parallel workloads? Tabby doesn’t support tensor parallelism, but you can run separate instances on different GPUs using CUDA_VISIBLE_DEVICES:

services:
  tabby-completion:
    image: tabbyml/tabby
    environment:
      - CUDA_VISIBLE_DEVICES=0
    command: serve --model CodeLlama-7B --device cuda
    # ... rest of config

  tabby-chat:
    image: tabbyml/tabby
    environment:
      - CUDA_VISIBLE_DEVICES=1
    command: serve --chat-model Mistral-7B-Instruct --device cuda
    # ... rest of config

Cost Comparison

OptionMonthly CostAnnual CostPrivacy
GitHub Copilot Pro$10$120Code sent to cloud
GitHub Copilot Business$19/user$228/userCode sent to cloud
GitHub Copilot Enterprise$39/user$468/userCode sent to cloud
Tabby (self-hosted)Electricity only~$20-50Fully private

A dedicated GPU server costs roughly $2-4 per month in electricity to run continuously. Even factoring in the one-time hardware cost, self-hosting pays for itself within a year for a single developer.

The Trade-offs

Tabby isn’t a drop-in replacement with zero compromises:

Quality varies by model: Smaller models give faster but simpler suggestions. Some users on Hacker News noted that suggestions can be “junior level”—helpful for boilerplate, but you shouldn’t accept everything blindly.

No agentic features: Unlike some newer tools, Tabby focuses on completion and chat. It won’t autonomously run commands or modify multiple files.

Setup required: You’re managing infrastructure. Updates, model downloads, and configuration fall on you.

Fewer integrations: While VS Code and JetBrains support is solid, you won’t find the same breadth of IDE plugins as Copilot.

Who Should Use This

Tabby makes sense if:

  • You work with proprietary or sensitive code that can’t leave your network
  • Your company has data governance requirements prohibiting cloud AI tools
  • You want to avoid monthly subscription costs
  • You have unused GPU capacity (gaming PC, ML workstation)
  • You value controlling your development tools

For casual hobby projects where privacy doesn’t matter, Copilot’s free tier might be simpler. But for professional work on sensitive codebases, running your own coding assistant removes a class of risks entirely.

Get Started

  1. Clone the example configs or use the Docker Compose setup above
  2. Start with StarCoder-1B to verify everything works
  3. Upgrade to larger models as needed
  4. Index your repositories for context-aware suggestions

Your code stays yours. Your suggestions run locally. And after initial setup, it costs nothing.