GitHub Copilot costs $10-39 per month, sends your code to Microsoft’s servers, and on the free and Pro tiers, likely uses your code to train future models. For anyone working with proprietary code or wanting to keep their work private, that’s a problem.
The alternative: run your own code completion locally. It takes about 15 minutes to set up, costs nothing, and your code never leaves your machine.
What You’ll Get
By the end of this guide, you’ll have tab-autocomplete working in VS Code that:
- Runs entirely on your hardware
- Works offline
- Sends zero data to any external server
- Costs nothing after initial setup
- Provides quality comparable to Copilot for most tasks
The tradeoff: you need decent hardware, and completions may be slightly slower than cloud-based alternatives.
Requirements
Minimum hardware:
- 8GB RAM (16GB recommended)
- Modern CPU with AVX2 support
- 10GB free disk space
Recommended for best performance:
- 16GB+ RAM
- NVIDIA GPU with 8GB+ VRAM, or Apple Silicon Mac
- SSD storage
If you have an M1/M2/M3 Mac, you’re in luck. Apple Silicon runs local models extremely well thanks to unified memory architecture.
Step 1: Install Ollama
Ollama is the simplest way to run LLMs locally. It handles model downloads, memory management, and provides an API that Continue can talk to.
macOS:
brew install ollama
Linux:
curl -fsSL https://ollama.com/install.sh | sh
Windows: Download the installer from ollama.com/download.
After installation, start the Ollama service:
ollama serve
On macOS, Ollama runs as a background service automatically after installation.
Step 2: Download a Code Model
Not all LLMs are good at code completion. You want a model specifically trained for fill-in-the-middle (FIM) tasks, which is what tab-autocomplete requires.
The current best option is Qwen2.5-Coder. It scores 88.4% on HumanEval (a standard coding benchmark) at just 7 billion parameters, matching or beating models 3-5x its size.
Download it:
ollama pull qwen2.5-coder:7b
This downloads about 4.5GB. The model needs roughly 6GB of RAM/VRAM when running.
If 7B is too slow on your hardware, try the smaller versions:
ollama pull qwen2.5-coder:3b # Faster, still good quality
ollama pull qwen2.5-coder:1.5b # Fastest, acceptable quality
Step 3: Install Continue
Continue is an open-source VS Code extension that provides AI-assisted coding. Unlike Copilot, it lets you use any model backend, including local ones.
Install it from VS Code:
- Open VS Code
- Go to Extensions (Ctrl/Cmd+Shift+X)
- Search for “Continue”
- Install the one by Continue.dev
After installation, you’ll see a Continue icon in your sidebar.
Step 4: Configure Autocomplete
Continue needs to know to use your local Ollama instance for autocomplete. Open the Continue settings:
- Click the Continue icon in the sidebar
- Click the gear icon (settings)
- Click “Open config.json”
Add or modify the tabAutocompleteModel section:
{
"tabAutocompleteModel": {
"title": "Qwen2.5-Coder (Local)",
"provider": "ollama",
"model": "qwen2.5-coder:7b"
}
}
If you downloaded a different model size, adjust the model name accordingly.
Save the file. Continue will automatically reload the configuration.
Step 5: Enable and Test
Make sure autocomplete is enabled:
- Open VS Code Settings (Ctrl/Cmd+,)
- Search for “Continue autocomplete”
- Ensure “Enable Tab Autocomplete” is checked
Now test it. Open or create a Python file and start typing:
def calculate_fibonacci(n):
"""Calculate the nth Fibonacci number."""
Pause after the docstring. You should see a ghost text suggestion appear. Press Tab to accept it.
If nothing appears, check that:
- Ollama is running (
ollama serve) - The model is downloaded (
ollama list) - Continue is properly configured
Tuning for Speed
If completions feel slow, you have several options.
Use a smaller model: Change your config to use the 3B or 1.5B version. The quality difference is noticeable but acceptable for most autocomplete tasks:
{
"tabAutocompleteModel": {
"provider": "ollama",
"model": "qwen2.5-coder:3b"
}
}
Adjust debounce timing: Continue waits a moment after you stop typing before requesting completions. You can tune this:
{
"tabAutocompleteOptions": {
"debounceDelay": 500
}
}
Lower values mean faster suggestions but more compute load.
Keep the model loaded: By default, Ollama unloads models after 5 minutes of inactivity. To keep your code model always loaded:
curl http://localhost:11434/api/generate -d '{"model": "qwen2.5-coder:7b", "keep_alive": -1}'
This tells Ollama to keep the model in memory indefinitely.
Adding Chat (Optional)
Autocomplete is only part of what Copilot offers. You might also want an AI to explain code, answer questions, or help with refactoring.
Add a chat model to your Continue config:
{
"models": [
{
"title": "Qwen2.5-Coder Chat",
"provider": "ollama",
"model": "qwen2.5-coder:7b-instruct"
}
],
"tabAutocompleteModel": {
"provider": "ollama",
"model": "qwen2.5-coder:7b"
}
}
Download the instruct variant:
ollama pull qwen2.5-coder:7b-instruct
Now you can select code, right-click, and ask Continue to explain or modify it.
How It Compares to Copilot
After a week of daily use, here’s an honest assessment:
Where local wins:
- Privacy: Your code stays on your machine
- Cost: Free forever after setup
- Offline: Works without internet
- Customization: Use any model, tweak any setting
Where Copilot wins:
- Speed: Cloud inference is typically faster
- Context: Copilot has better whole-project awareness
- Polish: The integration is more seamless
- Updates: New model improvements happen automatically
Where they’re roughly equal:
- Basic autocomplete quality for common patterns
- Function completion from docstrings
- Boilerplate generation
For proprietary codebases, the privacy benefit alone makes local worth it. For open-source work where you don’t mind cloud processing, Copilot’s polish might be worth $10/month.
Alternative: Tabby for Team Setups
If you need code completion for a team rather than just yourself, look at Tabby. It’s a self-hosted server that multiple developers can connect to.
Tabby advantages:
- One powerful server, many clients
- Repository indexing for better context
- Admin controls and audit logs
- No need for GPUs on developer machines
The tradeoff is more infrastructure complexity. For solo developers, Continue + Ollama is simpler.
Troubleshooting
Completions aren’t appearing:
- Check Ollama is running:
curl http://localhost:11434/api/tags - Restart VS Code
- Check Continue’s output panel for errors (View > Output > Continue)
Completions are very slow:
- Use a smaller model
- Check if something else is using your GPU
- On laptops, plug in; battery mode throttles performance
Model keeps unloading:
- Use the
keep_aliveAPI call mentioned above - Or add
OLLAMA_KEEP_ALIVE=-1to your environment
Out of memory errors:
- Close other applications
- Use a quantized model:
qwen2.5-coder:7b-q4_K_M - Use a smaller model size
What You Can Do
-
Try it for a week: Give local code completion a real test before deciding. The first day feels slower; by day three, you’ve adapted.
-
Experiment with models: New code models release regularly. DeepSeek-Coder, Codestral, and others all work with this setup.
-
Contribute to open source: Both Continue and Ollama are open-source projects that benefit from community contributions and feedback.
-
Tell your team: If you work somewhere with strict data policies, this setup might be the only way to get AI code assistance approved.
The era of “AI code completion requires sending code to the cloud” is over. The tools are here, they’re free, and they work.