GitHub Copilot costs $10-39 per month for individuals and $19-39 per user per month for businesses. That’s $120-468 annually just to have an AI suggest code while it sends every keystroke to Microsoft’s servers. There’s another option: run the same capability yourself, keep your code private, and pay nothing beyond electricity.
Tabby is an open-source, self-hosted AI coding assistant with 33,000+ GitHub stars. It offers real-time code completion, chat assistance, and repository-level context—all running entirely on your own hardware.
What You Get
Tabby provides autocomplete suggestions as you type, similar to Copilot. The difference: your code never leaves your machine. No cloud API calls, no data collection, no subscription fees.
Key features:
- Real-time code completion using Fill-in-the-Middle prompting
- Chat interface for asking questions about your codebase
- Repository indexing for context-aware suggestions
- 40+ language support including Python, JavaScript, TypeScript, Rust, Go, and Java
- IDE plugins for VS Code, JetBrains IDEs, and Vim/Neovim
The latest version (v0.32.0, released January 2026) added GitLab merge request context indexing and REST API documentation enhancement capabilities.
Hardware Requirements
You need a GPU with at least 8GB VRAM for reasonable performance. Here’s what Tabby recommends:
| Model Size | Minimum GPU | Recommended |
|---|---|---|
| 1B parameters | Any modern GPU | RTX 2060 or better |
| 7B parameters | 8GB VRAM | RTX 3080, RTX 4070 |
| 13B parameters | 16GB+ VRAM | RTX 4090, A100 |
For Mac users: Tabby runs on Apple Silicon. The StarCoder-1B model achieves approximately 90 tokens per second on an M2 Max—fast enough for real-time suggestions.
No GPU? Tabby supports CPU inference, though expect slower response times. A smaller model like StarCoder-1B remains usable on CPU-only systems.
Installation
Prerequisites
- Docker installed and running
- NVIDIA Container Toolkit (for GPU acceleration)
Install the NVIDIA Container Toolkit on Ubuntu/Debian:
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
Docker Compose Setup
Create a docker-compose.yml file:
version: '3.5'
services:
tabby:
restart: always
image: tabbyml/tabby
command: serve --model StarCoder-1B --chat-model Qwen2-1.5B-Instruct --device cuda
volumes:
- "./tabby-data:/data"
ports:
- "8080:8080"
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
Start Tabby:
docker compose up -d
The first launch downloads the models (roughly 500MB for StarCoder-1B). Wait a few minutes, then open http://localhost:8080 to create your admin account.
CPU-Only Setup
No NVIDIA GPU? Remove the GPU reservation and change the device flag:
version: '3.5'
services:
tabby:
restart: always
image: tabbyml/tabby
command: serve --model StarCoder-1B --chat-model Qwen2-1.5B-Instruct --device cpu
volumes:
- "./tabby-data:/data"
ports:
- "8080:8080"
Mac Installation
On Apple Silicon, use Homebrew:
brew install tabbyml/tabby/tabby
tabby serve --model StarCoder-1B --chat-model Qwen2-1.5B-Instruct --device metal
Connect Your IDE
VS Code
- Open Extensions (Cmd/Ctrl + Shift + X)
- Search “Tabby”
- Install the official Tabby extension
- Open settings and set the server URL to
http://localhost:8080 - Authenticate if you set up authentication
The extension shows a connection status indicator. Once connected, you’ll see inline suggestions as you type.
JetBrains IDEs
- Open Settings > Plugins
- Search “Tabby” in Marketplace
- Install and restart
- Configure the server URL under Settings > Tools > Tabby
Vim/Neovim
For Neovim with lazy.nvim:
{
"TabbyML/vim-tabby",
lazy = false,
dependencies = { "neovim/nvim-lspconfig" },
}
Configure in your init file:
vim.g.tabby_server_url = 'http://localhost:8080'
Choosing Models
Tabby separates completion and chat models. You can mix and match based on your hardware.
Completion Models
- StarCoder-1B: Fast, lightweight, good for quick suggestions
- StarCoder-3B: Better quality, still runs on 8GB VRAM
- CodeLlama-7B: High quality, needs 8GB+ VRAM
- CodeQwen-7B: Strong performance, especially for Python
Chat Models
- Qwen2-1.5B-Instruct: Lightweight, good for basic questions
- CodeLlama-7B-Instruct: Better reasoning capability
- Mistral-7B-Instruct: Strong general-purpose chat
Check the models registry for the full list. You can also configure external providers like Ollama for more model options.
Index Your Codebase
Tabby’s context-aware suggestions improve dramatically when you index your repositories.
- Open the Tabby web interface at
http://localhost:8080 - Navigate to Code Browser > Repositories
- Add your repository URL or local path
- Wait for indexing to complete
Once indexed, Tabby considers your existing code patterns when suggesting completions.
Performance Tuning
Reduce Latency
Tabby includes an adaptive caching strategy. For additional speed:
- Use a smaller model (1B-3B parameters)
- Keep Tabby running continuously (cold starts are slow)
- If using Docker, ensure the container has sufficient memory allocated
Multiple Users
For teams, run Tabby on a dedicated server. Each user connects their IDE to the central server URL. One GPU handles multiple concurrent users for typical coding sessions.
Want parallel workloads? Tabby doesn’t support tensor parallelism, but you can run separate instances on different GPUs using CUDA_VISIBLE_DEVICES:
services:
tabby-completion:
image: tabbyml/tabby
environment:
- CUDA_VISIBLE_DEVICES=0
command: serve --model CodeLlama-7B --device cuda
# ... rest of config
tabby-chat:
image: tabbyml/tabby
environment:
- CUDA_VISIBLE_DEVICES=1
command: serve --chat-model Mistral-7B-Instruct --device cuda
# ... rest of config
Cost Comparison
| Option | Monthly Cost | Annual Cost | Privacy |
|---|---|---|---|
| GitHub Copilot Pro | $10 | $120 | Code sent to cloud |
| GitHub Copilot Business | $19/user | $228/user | Code sent to cloud |
| GitHub Copilot Enterprise | $39/user | $468/user | Code sent to cloud |
| Tabby (self-hosted) | Electricity only | ~$20-50 | Fully private |
A dedicated GPU server costs roughly $2-4 per month in electricity to run continuously. Even factoring in the one-time hardware cost, self-hosting pays for itself within a year for a single developer.
The Trade-offs
Tabby isn’t a drop-in replacement with zero compromises:
Quality varies by model: Smaller models give faster but simpler suggestions. Some users on Hacker News noted that suggestions can be “junior level”—helpful for boilerplate, but you shouldn’t accept everything blindly.
No agentic features: Unlike some newer tools, Tabby focuses on completion and chat. It won’t autonomously run commands or modify multiple files.
Setup required: You’re managing infrastructure. Updates, model downloads, and configuration fall on you.
Fewer integrations: While VS Code and JetBrains support is solid, you won’t find the same breadth of IDE plugins as Copilot.
Who Should Use This
Tabby makes sense if:
- You work with proprietary or sensitive code that can’t leave your network
- Your company has data governance requirements prohibiting cloud AI tools
- You want to avoid monthly subscription costs
- You have unused GPU capacity (gaming PC, ML workstation)
- You value controlling your development tools
For casual hobby projects where privacy doesn’t matter, Copilot’s free tier might be simpler. But for professional work on sensitive codebases, running your own coding assistant removes a class of risks entirely.
Get Started
- Clone the example configs or use the Docker Compose setup above
- Start with StarCoder-1B to verify everything works
- Upgrade to larger models as needed
- Index your repositories for context-aware suggestions
Your code stays yours. Your suggestions run locally. And after initial setup, it costs nothing.