Local AI by VRAM Tier - 8GB | 12GB | 16GB | 24GB | 32GB
Deep dives: Chat | Coding | Translation | Vision | Speech | Agents
Google Translate is free but sends every word to Google’s servers. DeepL costs $30/month for API access and covers only 28 languages. Running translation locally costs nothing per word, works offline, and never phones home.
The local translation landscape has split into two camps: dedicated translation models built specifically for the task, and general LLMs that happen to translate well. In January 2026, Google released TranslateGemma - purpose-built translation models running on the Gemma 3 architecture - and it changed the equation. Here’s what works at every VRAM tier.
Two Approaches to Local Translation
Dedicated Translation Models
These are trained specifically for translation. They translate directly between language pairs without the overhead of general chat capabilities.
| Model | Params | Languages | VRAM (Q4) | License |
|---|---|---|---|---|
| TranslateGemma 4B | 4B | 55 | ~3.3 GB | Open (Gemma) |
| TranslateGemma 12B | 12B | 55 | ~8.1 GB | Open (Gemma) |
| TranslateGemma 27B | 27B | 55 | ~17 GB | Open (Gemma) |
| NLLB-200 1.3B | 1.3B | 200 | ~2 GB | CC-BY-NC 4.0 |
| NLLB-200 3.3B | 3.3B | 200 | ~4 GB | CC-BY-NC 4.0 |
| MADLAD-400 3B | 3B | 400+ | ~4 GB | CC-BY 4.0 |
| Argos Translate | Small | 60+ | CPU only | MIT |
General LLMs Used for Translation
General-purpose language models that produce natural, fluent translations - often better than dedicated models for literary or context-heavy text.
| Model | Params | Languages | VRAM (Q4) | Translation Strength |
|---|---|---|---|---|
| Qwen 3 8B | 8B | 201 (claimed) | 6.5 GB | Asian languages, technical content |
| Qwen 3 14B | 14B | 201 | 10.7 GB | Balanced quality |
| Aya Expanse 8B | 8B | 23 | ~6 GB | Low-resource languages |
| Aya Expanse 32B | 32B | 23 | ~22 GB | Highest multilingual quality |
| Gemma 3 27B | 27B | Multi | ~14 GB (QAT) | General European/Asian |
The New Standard: TranslateGemma
TranslateGemma changed local translation in January 2026. These models are built on Gemma 3 but fine-tuned specifically for translation across 55 languages, including high-, mid-, and low-resource languages.
The key benchmark from the technical report: TranslateGemma 12B outperforms the baseline Gemma 3 27B on the WMT24++ translation benchmark using MetricX scores (3.60 vs 4.04 - lower is better), achieving better quality with fewer parameters. The COMET22 scores tell the same story: TranslateGemma 12B scores 83.5 vs baseline Gemma 3 12B’s 81.6.
All three sizes are available on Ollama right now:
ollama pull translategemma:4b # 3.3 GB
ollama pull translategemma:12b # 8.1 GB
ollama pull translategemma:27b # 17 GB
Quality Comparison
Based on WMT24++ benchmark results (MetricX - lower is better, COMET22 - higher is better):
| Model | MetricX | COMET22 | Notes |
|---|---|---|---|
| TranslateGemma 27B | 3.09 | 84.4 | Best local option |
| TranslateGemma 12B | 3.60 | 83.5 | Beats Gemma 3 27B baseline |
| Gemma 3 27B (baseline) | 4.04 | 83.1 | Good but TranslateGemma is better |
| TranslateGemma 4B | 5.32 | 80.1 | Impressive for 4B |
| Gemma 3 12B (baseline) | 4.86 | 81.6 | Decent general LLM translation |
| Gemma 3 4B (baseline) | 6.97 | 77.2 | Functional but limited |
For context, Google Translate and DeepL typically score in the 2.5-3.5 MetricX range on European language pairs. TranslateGemma 27B is within striking distance.
By VRAM Tier
8GB VRAM {#8gb}
GPUs: RTX 4060, RTX 3060 8GB, RTX 3070
| Model | VRAM | Languages | Quality |
|---|---|---|---|
| TranslateGemma 4B | 3.3 GB | 55 | Good (MetricX 5.32) |
| Argos Translate (CPU) | 0 GB | 60+ | Functional |
| NLLB-200 1.3B | ~2 GB | 200 | Literal but accurate |
| Qwen 3.5 4B | 3.4 GB | 201 (claimed) | Natural prose |
Best pick: TranslateGemma 4B. At 3.3GB, it leaves 4.7GB free and translates 55 languages at quality that surpasses the Gemma 3 4B baseline by a wide margin (80.1 vs 77.2 COMET22). Purpose-built translation consistently beats general models at this size.
Maximum language coverage: NLLB-200 at 1.3B parameters handles 200 languages in under 2GB. The translations are more literal than TranslateGemma’s - you’ll get the meaning but not the poetry. Good for rare language pairs where TranslateGemma has no coverage.
Offline alternative: Argos Translate runs entirely on CPU with no GPU needed. It powers LibreTranslate and handles 60+ language pairs. Quality is a clear step below the LLM-based options, but zero VRAM means you can run it alongside any other model.
For all use cases at this level, see our 8GB VRAM complete guide.
12GB VRAM {#12gb}
GPUs: RTX 3060 12GB, RTX 4070
| Model | VRAM | Languages | Quality |
|---|---|---|---|
| TranslateGemma 12B | 8.1 GB | 55 | Very Good (MetricX 3.60) |
| Qwen 3 8B | 6.5 GB | 201 | Natural but variable |
| Aya Expanse 8B | ~6 GB | 23 | Strong low-resource langs |
| NLLB-200 3.3B | ~4 GB | 200 | Better than 1.3B |
Best pick: TranslateGemma 12B. This is the inflection point. At 8.1GB and MetricX 3.60, TranslateGemma 12B beats the Gemma 3 27B baseline while using half the VRAM. For the 55 supported languages, this is near-commercial quality.
Asian language specialist: Qwen 3 8B alongside TranslateGemma. Qwen-MT outperforms comparably-sized models on Chinese, Japanese, and Korean translation. If your work involves CJK languages, run TranslateGemma for European pairs and Qwen for Asian pairs - both fit in 12GB simultaneously (14.6GB total, tight but workable with context management).
Low-resource champion: Aya Expanse 8B from Cohere covers 23 languages with a focus on underserved languages. It achieves 60-70% win rates against Gemma 2 and Llama 3.1 on multilingual benchmarks. If you translate Arabic, Hindi, Indonesian, or other languages where quality typically drops off, Aya is worth testing.
For all use cases at this level, see our 12GB VRAM complete guide.
16GB VRAM {#16gb}
GPUs: RTX 4060 Ti 16GB, RTX 5060, Arc A770
| Model | VRAM | Languages | Quality |
|---|---|---|---|
| TranslateGemma 12B | 8.1 GB | 55 | Very Good |
| Qwen 3 14B | 10.7 GB | 201 | Excellent prose quality |
| TranslateGemma 4B + Qwen 3 8B | 3.3 + 6.5 GB | 55 + 201 | Dual-model coverage |
Best pick: TranslateGemma 12B + headroom. Same model as 12GB, but now with 8GB free for context. Load entire documents and translate them in one pass. The extra context is especially valuable for maintaining consistency across long texts.
Literary translation: Qwen 3 14B at 10.7GB. General LLMs produce more natural, context-aware translations than dedicated models. If you’re translating fiction, marketing copy, or anything where tone matters as much as accuracy, Qwen 3 14B’s broader language understanding produces more readable results. The trade-off: slower and less consistent on technical/formal text.
Dual setup: TranslateGemma 4B (3.3GB) for quick translations + Qwen 3 8B (6.5GB) for context-heavy Asian language work. Both loaded simultaneously at ~10GB total.
For all use cases at this level, see our 16GB VRAM complete guide.
24GB VRAM {#24gb}
GPUs: RTX 3090, RTX 4090
| Model | VRAM | Languages | Quality |
|---|---|---|---|
| TranslateGemma 27B | ~17 GB | 55 | Excellent (MetricX 3.09) |
| Aya Expanse 32B | ~22 GB | 23 | Best multilingual |
| Qwen 3 32B | 22.2 GB | 201 | Best natural prose |
Best pick: TranslateGemma 27B. At MetricX 3.09 and COMET22 84.4, this approaches Google Translate quality for supported languages. It fits in 17GB, leaving 7GB for other models or context.
Multilingual champion: Aya Expanse 32B at ~22GB. It scores 58.8 chrF++ and achieves 25% higher accuracy on low-resource language benchmarks compared to competing models. For Arabic, Hindi, Persian, Indonesian, and similar languages, this is the strongest local option.
Maximum naturalness: Qwen 3 32B at 22.2GB. For literary translation, marketing localization, or any task where the output needs to read like it was originally written in the target language, a large general LLM beats dedicated translation models.
For all use cases at this level, see our 24GB VRAM complete guide.
32GB VRAM {#32gb}
GPUs: RTX 5090
| Model | VRAM | Quality |
|---|---|---|
| TranslateGemma 27B (Q6_K) | ~22 GB | Near-lossless |
| TranslateGemma 27B + Qwen 3 14B | ~17 + 10.7 GB | Best combo |
Best pick: TranslateGemma 27B at Q6_K + Qwen 3 14B. Run the best translation model at near-lossless quantization alongside a strong general LLM for tasks that need more context awareness. Both loaded simultaneously at ~28GB.
For all use cases at this level, see our 32GB VRAM complete guide.
Cross-Tier Summary
| Tier | Best Pick | Languages | Quality Level |
|---|---|---|---|
| 8GB | TranslateGemma 4B | 55 | Good |
| 12GB | TranslateGemma 12B | 55 | Very Good |
| 16GB | TranslateGemma 12B + headroom | 55 | Very Good |
| 24GB | TranslateGemma 27B | 55 | Excellent |
| 32GB | TranslateGemma 27B (Q6) | 55 | Near-commercial |
When to Use What
TranslateGemma for: straightforward translation of the 55 supported languages. Best accuracy-per-VRAM, fastest processing.
Qwen 3 for: Asian languages (CJK), literary/creative translation, tasks needing 201-language breadth, and translation that requires understanding context deeply.
Aya Expanse for: low-resource languages, Arabic/Hindi/Indonesian translation, and any language pair where mainstream models typically underperform.
NLLB-200 for: maximum language coverage (200 languages), especially rare language pairs. Literal but accurate.
LibreTranslate / Argos for: zero-GPU setups, simple integrations via API, and environments where even CPU-only is acceptable.
Quick Start
# Install and run TranslateGemma
ollama pull translategemma:12b
# Translate from English to French
ollama run translategemma:12b "Translate to French: The local AI revolution is making translation free and private."
# Or use the API for batch translation
curl http://localhost:11434/api/generate -d '{
"model": "translategemma:12b",
"prompt": "Translate to German: Privacy matters more than convenience.",
"stream": false
}'
For a full LibreTranslate setup with web interface and API, see our Self-Host LibreTranslate guide.
Honest Limits
Local translation models have real gaps compared to Google Translate and DeepL:
- Rare language pairs - quality drops sharply for less common combinations. Even NLLB-200 with 200 languages has thin coverage for many of them
- Idioms and cultural context - dedicated translation models tend to translate too literally. LLMs do better but aren’t perfect
- Consistency across documents - translating a 50-page document and keeping terminology consistent requires manual review
- Speed for bulk - translating thousands of documents is slower locally than API-based services with dedicated infrastructure
For the 55 languages TranslateGemma supports, the quality gap with commercial services is shrinking fast. For everything else, it depends heavily on the specific language pair.