Self-Host Tabby: Run Your Own AI Coding Assistant and Ditch GitHub Copilot

GitHub Copilot costs $10-39 per month for individuals and $19-39 per user per month for businesses. That’s $120-468 annually just to have an AI suggest code while it sends every keystroke to Microsoft’s servers. There’s another option: run the same capability yourself, keep your code private, and pay nothing beyond electricity.

Tabby is an open-source, self-hosted AI coding assistant with 33,000+ GitHub stars. It offers real-time code completion, chat assistance, and repository-level context—all running entirely on your own hardware.

What You Get

Tabby provides autocomplete suggestions as you type, similar to Copilot. The difference: your code never leaves your machine. No cloud API calls, no data collection, no subscription fees.

Key features:

Real-time code completion using Fill-in-the-Middle prompting
Chat interface for asking questions about your codebase
Repository indexing for context-aware suggestions
40+ language support including Python, JavaScript, TypeScript, Rust, Go, and Java
IDE plugins for VS Code, JetBrains IDEs, and Vim/Neovim

The latest version (v0.32.0, released January 2026) added GitLab merge request context indexing and REST API documentation enhancement capabilities.

Hardware Requirements

You need a GPU with at least 8GB VRAM for reasonable performance. Here’s what Tabby recommends:

Model Size	Minimum GPU	Recommended
1B parameters	Any modern GPU	RTX 2060 or better
7B parameters	8GB VRAM	RTX 3080, RTX 4070
13B parameters	16GB+ VRAM	RTX 4090, A100

For Mac users: Tabby runs on Apple Silicon. The StarCoder-1B model achieves approximately 90 tokens per second on an M2 Max—fast enough for real-time suggestions.

No GPU? Tabby supports CPU inference, though expect slower response times. A smaller model like StarCoder-1B remains usable on CPU-only systems.

Installation

Prerequisites

Docker installed and running
NVIDIA Container Toolkit (for GPU acceleration)

Install the NVIDIA Container Toolkit on Ubuntu/Debian:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

Docker Compose Setup

Create a docker-compose.yml file:

version: '3.5'
services:
  tabby:
    restart: always
    image: tabbyml/tabby
    command: serve --model StarCoder-1B --chat-model Qwen2-1.5B-Instruct --device cuda
    volumes:
      - "./tabby-data:/data"
    ports:
      - "8080:8080"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

Start Tabby:

docker compose up -d

The first launch downloads the models (roughly 500MB for StarCoder-1B). Wait a few minutes, then open http://localhost:8080 to create your admin account.

CPU-Only Setup

No NVIDIA GPU? Remove the GPU reservation and change the device flag:

version: '3.5'
services:
  tabby:
    restart: always
    image: tabbyml/tabby
    command: serve --model StarCoder-1B --chat-model Qwen2-1.5B-Instruct --device cpu
    volumes:
      - "./tabby-data:/data"
    ports:
      - "8080:8080"

Mac Installation

On Apple Silicon, use Homebrew:

brew install tabbyml/tabby/tabby
tabby serve --model StarCoder-1B --chat-model Qwen2-1.5B-Instruct --device metal

Connect Your IDE

VS Code

Open Extensions (Cmd/Ctrl + Shift + X)
Search “Tabby”
Install the official Tabby extension
Open settings and set the server URL to http://localhost:8080
Authenticate if you set up authentication

The extension shows a connection status indicator. Once connected, you’ll see inline suggestions as you type.

JetBrains IDEs

Open Settings > Plugins
Search “Tabby” in Marketplace
Install and restart
Configure the server URL under Settings > Tools > Tabby

Vim/Neovim

For Neovim with lazy.nvim:

{
  "TabbyML/vim-tabby",
  lazy = false,
  dependencies = { "neovim/nvim-lspconfig" },
}

Configure in your init file:

vim.g.tabby_server_url = 'http://localhost:8080'

Choosing Models

Tabby separates completion and chat models. You can mix and match based on your hardware.

Completion Models

StarCoder-1B: Fast, lightweight, good for quick suggestions
StarCoder-3B: Better quality, still runs on 8GB VRAM
CodeLlama-7B: High quality, needs 8GB+ VRAM
CodeQwen-7B: Strong performance, especially for Python

Chat Models

Qwen2-1.5B-Instruct: Lightweight, good for basic questions
CodeLlama-7B-Instruct: Better reasoning capability
Mistral-7B-Instruct: Strong general-purpose chat

Check the models registry for the full list. You can also configure external providers like Ollama for more model options.

Index Your Codebase

Tabby’s context-aware suggestions improve dramatically when you index your repositories.

Open the Tabby web interface at http://localhost:8080
Navigate to Code Browser > Repositories
Add your repository URL or local path
Wait for indexing to complete

Once indexed, Tabby considers your existing code patterns when suggesting completions.

Performance Tuning

Reduce Latency

Tabby includes an adaptive caching strategy. For additional speed:

Use a smaller model (1B-3B parameters)
Keep Tabby running continuously (cold starts are slow)
If using Docker, ensure the container has sufficient memory allocated

Multiple Users

For teams, run Tabby on a dedicated server. Each user connects their IDE to the central server URL. One GPU handles multiple concurrent users for typical coding sessions.

Want parallel workloads? Tabby doesn’t support tensor parallelism, but you can run separate instances on different GPUs using CUDA_VISIBLE_DEVICES:

services:
  tabby-completion:
    image: tabbyml/tabby
    environment:
      - CUDA_VISIBLE_DEVICES=0
    command: serve --model CodeLlama-7B --device cuda
    # ... rest of config

  tabby-chat:
    image: tabbyml/tabby
    environment:
      - CUDA_VISIBLE_DEVICES=1
    command: serve --chat-model Mistral-7B-Instruct --device cuda
    # ... rest of config

Cost Comparison

Option	Monthly Cost	Annual Cost	Privacy
GitHub Copilot Pro	$10	$120	Code sent to cloud
GitHub Copilot Business	$19/user	$228/user	Code sent to cloud
GitHub Copilot Enterprise	$39/user	$468/user	Code sent to cloud
Tabby (self-hosted)	Electricity only	~$20-50	Fully private

A dedicated GPU server costs roughly $2-4 per month in electricity to run continuously. Even factoring in the one-time hardware cost, self-hosting pays for itself within a year for a single developer.

The Trade-offs

Tabby isn’t a drop-in replacement with zero compromises:

Quality varies by model: Smaller models give faster but simpler suggestions. Some users on Hacker News noted that suggestions can be “junior level”—helpful for boilerplate, but you shouldn’t accept everything blindly.

No agentic features: Unlike some newer tools, Tabby focuses on completion and chat. It won’t autonomously run commands or modify multiple files.

Setup required: You’re managing infrastructure. Updates, model downloads, and configuration fall on you.

Fewer integrations: While VS Code and JetBrains support is solid, you won’t find the same breadth of IDE plugins as Copilot.

Who Should Use This

Tabby makes sense if:

You work with proprietary or sensitive code that can’t leave your network
Your company has data governance requirements prohibiting cloud AI tools
You want to avoid monthly subscription costs
You have unused GPU capacity (gaming PC, ML workstation)
You value controlling your development tools

For casual hobby projects where privacy doesn’t matter, Copilot’s free tier might be simpler. But for professional work on sensitive codebases, running your own coding assistant removes a class of risks entirely.

Get Started

Clone the example configs or use the Docker Compose setup above
Start with StarCoder-1B to verify everything works
Upgrade to larger models as needed
Index your repositories for context-aware suggestions

Your code stays yours. Your suggestions run locally. And after initial setup, it costs nothing.