GitHub Copilot costs $10/month for individuals, $19/user/month for teams. That’s $120-228 per developer per year - and your code gets sent to Microsoft’s servers. There’s another option: run your own AI code assistant locally, with the same inline suggestions and chat features, for the one-time cost of hardware you probably already own.
Tabby is an open-source, self-hosted alternative to GitHub Copilot with over 33,000 GitHub stars. It runs on consumer GPUs, Apple Silicon, and even CPU-only setups. Your code never leaves your machine.
This guide walks through setting up Tabby with Docker, connecting it to VS Code, and choosing the right model for your hardware.
What You’ll Get
After following this guide, you’ll have:
- AI code completion that suggests lines and functions as you type
- A chat interface for asking questions about your code
- Repository context awareness - Tabby can index your codebase for smarter suggestions
- Zero monthly fees
- Complete data privacy
Hardware Requirements
Tabby scales down surprisingly well. Here’s what you need for different model sizes:
Small models (1B-3B parameters):
- NVIDIA T4, GTX 1060+, RTX 2060+, or any 10/20-series GPU
- Apple Silicon M1/M2/M3/M4
- ~4GB VRAM minimum
Medium models (7B parameters):
- NVIDIA T4 or better
- Apple Silicon with 16GB+ unified memory
- ~8GB VRAM for int8 quantization
Large models (13B+):
- NVIDIA V100, A100, RTX 3090, or 40-series GPUs
- ~16GB+ VRAM
Don’t have a GPU? Tabby works on CPU too - just expect slower response times. Start with a 1B model and see if it’s usable for your workflow.
Step 1: Install Docker
If you don’t have Docker installed:
macOS:
brew install --cask docker
Ubuntu/Debian:
sudo apt update && sudo apt install docker.io
sudo usermod -aG docker $USER
# Log out and back in
Windows: Download Docker Desktop and install.
Step 2: Run Tabby Server
Choose your command based on your hardware:
NVIDIA GPU (recommended):
docker run -d \
--name tabby \
--gpus all \
-p 8080:8080 \
-v $HOME/.tabby:/data \
tabbyml/tabby \
serve \
--model Qwen2.5-Coder-1.5B \
--chat-model Qwen2.5-Coder-1.5B-Instruct \
--device cuda
Apple Silicon (Metal):
docker run -d \
--name tabby \
-p 8080:8080 \
-v $HOME/.tabby:/data \
tabbyml/tabby \
serve \
--model Qwen2.5-Coder-1.5B \
--chat-model Qwen2.5-Coder-1.5B-Instruct \
--device metal
CPU only:
docker run -d \
--name tabby \
-p 8080:8080 \
-v $HOME/.tabby:/data \
tabbyml/tabby \
serve \
--model Qwen2.5-Coder-0.5B \
--chat-model Qwen2.5-Coder-0.5B-Instruct \
--device cpu
The first run downloads the model - expect a few minutes depending on your connection.
Step 3: Create Your Account
- Open
http://localhost:8080in your browser - Create an admin account (first user becomes admin)
- Click your profile icon
- Copy your API token - you’ll need this for the IDE extension
Step 4: Connect VS Code
- Open VS Code
- Go to Extensions (Ctrl/Cmd+Shift+X)
- Search for “Tabby” and install the official extension
- Click the Tabby icon in the status bar (bottom right)
- Select “Connect to Server”
- Enter your server URL:
http://localhost:8080 - Paste your API token
Test it: open any code file and start typing a function. You should see gray completion suggestions appear. Press Tab to accept.
Choosing the Right Model
Tabby supports dozens of models. Here are the best options for different scenarios:
Best for limited hardware (4-8GB VRAM):
Qwen2.5-Coder-1.5B- Best quality-to-size ratioStarCoder-1B- Battle-tested, slightly olderCodeGemma-2B- Google’s option, good for general coding
Best for quality (8-16GB VRAM):
Qwen2.5-Coder-7B- Excellent across all languagesDeepSeekCoder-6.7B- Strong on complex completionsCodeLlama-7B- Meta’s coding model, well-tested
Best for power users (16GB+ VRAM):
Qwen2.5-Coder-14B- Near-commercial qualityCodestral-22B- Mistral’s coding specialistDeepSeek-Coder-V2-Lite- Mixture-of-experts architecture
To switch models, stop and restart Tabby with a different --model flag:
docker stop tabby && docker rm tabby
# Then run the docker command again with your new model
Adding Repository Context
Tabby can index your codebase for context-aware completions. This is where it starts to rival Copilot.
- In the Tabby web UI (
http://localhost:8080), go to Settings > Repositories - Add your local repository path or connect a GitHub/GitLab repo
- Wait for indexing to complete
Now when you code, Tabby understands your project’s patterns, naming conventions, and existing functions.
Troubleshooting
“Connection refused” errors:
Check if Tabby is running: docker ps | grep tabby
If not, check logs: docker logs tabby
Slow completions:
- Try a smaller model
- Ensure you’re using GPU acceleration (check
--deviceflag) - On NVIDIA, verify CUDA works:
nvidia-smi
High memory usage: Models load entirely into memory. Close other GPU-heavy applications or use a smaller model.
macOS Metal not working: Ensure you have the latest Docker Desktop with Apple Silicon support. Rosetta mode won’t have Metal access.
Tabby vs. Copilot: What You’re Trading
What you gain:
- Complete privacy - code never leaves your machine
- No recurring costs after hardware investment
- Works offline
- Customizable models
- Repository-aware context you control
What you lose:
- Copilot’s multi-model approach (GPT-4, Claude access)
- Slightly lower quality on very complex completions
- Initial setup time
- Responsibility for updates and maintenance
For most developers writing standard code - APIs, CRUD operations, data processing - you won’t notice the quality difference. Where Copilot still wins is on unusual edge cases and when you need to tap into GPT-4 or Claude for complex reasoning.
What’s Next
Once you’re comfortable with basic setup:
- Add team members: Tabby supports multi-user with authentication domains and SSO
- Try different chat models: Qwen2-1.5B-Instruct or Qwen3 models work well for the chat interface
- Index documentation: Add your internal docs as context sources
- Set up repository integration: Connect to GitHub/GitLab for PR context
Running your own AI coding assistant takes an hour to set up and saves you at minimum $120/year - more if you have a team. Your code stays private, you control the models, and you’re not dependent on anyone’s API staying online.
The only question is whether your hardware can keep up. Start with a small model, see if the speed is tolerable, and upgrade models as you find the limits. Most developers with any discrete GPU or recent Mac will find it perfectly usable.