Best Local Models for Translation in 2026: Every VRAM Tier Tested

Local AI by VRAM Tier - 8GB | 12GB | 16GB | 24GB | 32GB

Deep dives: Chat | Coding | Translation | Vision | Speech | Agents

Google Translate is free but sends every word to Google’s servers. DeepL costs $30/month for API access and covers only 28 languages. Running translation locally costs nothing per word, works offline, and never phones home.

The local translation landscape has split into two camps: dedicated translation models built specifically for the task, and general LLMs that happen to translate well. In January 2026, Google released TranslateGemma - purpose-built translation models running on the Gemma 3 architecture - and it changed the equation. Here’s what works at every VRAM tier.

Two Approaches to Local Translation

Dedicated Translation Models

These are trained specifically for translation. They translate directly between language pairs without the overhead of general chat capabilities.

Model	Params	Languages	VRAM (Q4)	License
TranslateGemma 4B	4B	55	~3.3 GB	Open (Gemma)
TranslateGemma 12B	12B	55	~8.1 GB	Open (Gemma)
TranslateGemma 27B	27B	55	~17 GB	Open (Gemma)
NLLB-200 1.3B	1.3B	200	~2 GB	CC-BY-NC 4.0
NLLB-200 3.3B	3.3B	200	~4 GB	CC-BY-NC 4.0
MADLAD-400 3B	3B	400+	~4 GB	CC-BY 4.0
Argos Translate	Small	60+	CPU only	MIT

General LLMs Used for Translation

General-purpose language models that produce natural, fluent translations - often better than dedicated models for literary or context-heavy text.

Model	Params	Languages	VRAM (Q4)	Translation Strength
Qwen 3 8B	8B	201 (claimed)	6.5 GB	Asian languages, technical content
Qwen 3 14B	14B	201	10.7 GB	Balanced quality
Aya Expanse 8B	8B	23	~6 GB	Low-resource languages
Aya Expanse 32B	32B	23	~22 GB	Highest multilingual quality
Gemma 3 27B	27B	Multi	~14 GB (QAT)	General European/Asian

The New Standard: TranslateGemma

TranslateGemma changed local translation in January 2026. These models are built on Gemma 3 but fine-tuned specifically for translation across 55 languages, including high-, mid-, and low-resource languages.

The key benchmark from the technical report: TranslateGemma 12B outperforms the baseline Gemma 3 27B on the WMT24++ translation benchmark using MetricX scores (3.60 vs 4.04 - lower is better), achieving better quality with fewer parameters. The COMET22 scores tell the same story: TranslateGemma 12B scores 83.5 vs baseline Gemma 3 12B’s 81.6.

All three sizes are available on Ollama right now:

ollama pull translategemma:4b    # 3.3 GB
ollama pull translategemma:12b   # 8.1 GB
ollama pull translategemma:27b   # 17 GB

Quality Comparison

Based on WMT24++ benchmark results (MetricX - lower is better, COMET22 - higher is better):

Model	MetricX	COMET22	Notes
TranslateGemma 27B	3.09	84.4	Best local option
TranslateGemma 12B	3.60	83.5	Beats Gemma 3 27B baseline
Gemma 3 27B (baseline)	4.04	83.1	Good but TranslateGemma is better
TranslateGemma 4B	5.32	80.1	Impressive for 4B
Gemma 3 12B (baseline)	4.86	81.6	Decent general LLM translation
Gemma 3 4B (baseline)	6.97	77.2	Functional but limited

For context, Google Translate and DeepL typically score in the 2.5-3.5 MetricX range on European language pairs. TranslateGemma 27B is within striking distance.

By VRAM Tier

8GB VRAM {#8gb}

GPUs: RTX 4060, RTX 3060 8GB, RTX 3070

Model	VRAM	Languages	Quality
TranslateGemma 4B	3.3 GB	55	Good (MetricX 5.32)
Argos Translate (CPU)	0 GB	60+	Functional
NLLB-200 1.3B	~2 GB	200	Literal but accurate
Qwen 3.5 4B	3.4 GB	201 (claimed)	Natural prose

Best pick: TranslateGemma 4B. At 3.3GB, it leaves 4.7GB free and translates 55 languages at quality that surpasses the Gemma 3 4B baseline by a wide margin (80.1 vs 77.2 COMET22). Purpose-built translation consistently beats general models at this size.

Maximum language coverage: NLLB-200 at 1.3B parameters handles 200 languages in under 2GB. The translations are more literal than TranslateGemma’s - you’ll get the meaning but not the poetry. Good for rare language pairs where TranslateGemma has no coverage.

Offline alternative: Argos Translate runs entirely on CPU with no GPU needed. It powers LibreTranslate and handles 60+ language pairs. Quality is a clear step below the LLM-based options, but zero VRAM means you can run it alongside any other model.

For all use cases at this level, see our 8GB VRAM complete guide.

12GB VRAM {#12gb}

GPUs: RTX 3060 12GB, RTX 4070

Model	VRAM	Languages	Quality
TranslateGemma 12B	8.1 GB	55	Very Good (MetricX 3.60)
Qwen 3 8B	6.5 GB	201	Natural but variable
Aya Expanse 8B	~6 GB	23	Strong low-resource langs
NLLB-200 3.3B	~4 GB	200	Better than 1.3B

Best pick: TranslateGemma 12B. This is the inflection point. At 8.1GB and MetricX 3.60, TranslateGemma 12B beats the Gemma 3 27B baseline while using half the VRAM. For the 55 supported languages, this is near-commercial quality.

Asian language specialist: Qwen 3 8B alongside TranslateGemma. Qwen-MT outperforms comparably-sized models on Chinese, Japanese, and Korean translation. If your work involves CJK languages, run TranslateGemma for European pairs and Qwen for Asian pairs - both fit in 12GB simultaneously (14.6GB total, tight but workable with context management).

Low-resource champion: Aya Expanse 8B from Cohere covers 23 languages with a focus on underserved languages. It achieves 60-70% win rates against Gemma 2 and Llama 3.1 on multilingual benchmarks. If you translate Arabic, Hindi, Indonesian, or other languages where quality typically drops off, Aya is worth testing.

For all use cases at this level, see our 12GB VRAM complete guide.

16GB VRAM {#16gb}

GPUs: RTX 4060 Ti 16GB, RTX 5060, Arc A770

Model	VRAM	Languages	Quality
TranslateGemma 12B	8.1 GB	55	Very Good
Qwen 3 14B	10.7 GB	201	Excellent prose quality
TranslateGemma 4B + Qwen 3 8B	3.3 + 6.5 GB	55 + 201	Dual-model coverage

Best pick: TranslateGemma 12B + headroom. Same model as 12GB, but now with 8GB free for context. Load entire documents and translate them in one pass. The extra context is especially valuable for maintaining consistency across long texts.

Literary translation: Qwen 3 14B at 10.7GB. General LLMs produce more natural, context-aware translations than dedicated models. If you’re translating fiction, marketing copy, or anything where tone matters as much as accuracy, Qwen 3 14B’s broader language understanding produces more readable results. The trade-off: slower and less consistent on technical/formal text.

Dual setup: TranslateGemma 4B (3.3GB) for quick translations + Qwen 3 8B (6.5GB) for context-heavy Asian language work. Both loaded simultaneously at ~10GB total.

For all use cases at this level, see our 16GB VRAM complete guide.

24GB VRAM {#24gb}

GPUs: RTX 3090, RTX 4090

Model	VRAM	Languages	Quality
TranslateGemma 27B	~17 GB	55	Excellent (MetricX 3.09)
Aya Expanse 32B	~22 GB	23	Best multilingual
Qwen 3 32B	22.2 GB	201	Best natural prose

Best pick: TranslateGemma 27B. At MetricX 3.09 and COMET22 84.4, this approaches Google Translate quality for supported languages. It fits in 17GB, leaving 7GB for other models or context.

Multilingual champion: Aya Expanse 32B at ~22GB. It scores 58.8 chrF++ and achieves 25% higher accuracy on low-resource language benchmarks compared to competing models. For Arabic, Hindi, Persian, Indonesian, and similar languages, this is the strongest local option.

Maximum naturalness: Qwen 3 32B at 22.2GB. For literary translation, marketing localization, or any task where the output needs to read like it was originally written in the target language, a large general LLM beats dedicated translation models.

For all use cases at this level, see our 24GB VRAM complete guide.

32GB VRAM {#32gb}

GPUs: RTX 5090

Model	VRAM	Quality
TranslateGemma 27B (Q6_K)	~22 GB	Near-lossless
TranslateGemma 27B + Qwen 3 14B	~17 + 10.7 GB	Best combo

Best pick: TranslateGemma 27B at Q6_K + Qwen 3 14B. Run the best translation model at near-lossless quantization alongside a strong general LLM for tasks that need more context awareness. Both loaded simultaneously at ~28GB.

For all use cases at this level, see our 32GB VRAM complete guide.

Cross-Tier Summary

Tier	Best Pick	Languages	Quality Level
8GB	TranslateGemma 4B	55	Good
12GB	TranslateGemma 12B	55	Very Good
16GB	TranslateGemma 12B + headroom	55	Very Good
24GB	TranslateGemma 27B	55	Excellent
32GB	TranslateGemma 27B (Q6)	55	Near-commercial

When to Use What

TranslateGemma for: straightforward translation of the 55 supported languages. Best accuracy-per-VRAM, fastest processing.

Qwen 3 for: Asian languages (CJK), literary/creative translation, tasks needing 201-language breadth, and translation that requires understanding context deeply.

Aya Expanse for: low-resource languages, Arabic/Hindi/Indonesian translation, and any language pair where mainstream models typically underperform.

NLLB-200 for: maximum language coverage (200 languages), especially rare language pairs. Literal but accurate.

LibreTranslate / Argos for: zero-GPU setups, simple integrations via API, and environments where even CPU-only is acceptable.

Quick Start

# Install and run TranslateGemma
ollama pull translategemma:12b

# Translate from English to French
ollama run translategemma:12b "Translate to French: The local AI revolution is making translation free and private."

# Or use the API for batch translation
curl http://localhost:11434/api/generate -d '{
  "model": "translategemma:12b",
  "prompt": "Translate to German: Privacy matters more than convenience.",
  "stream": false
}'

For a full LibreTranslate setup with web interface and API, see our Self-Host LibreTranslate guide.

Honest Limits

Local translation models have real gaps compared to Google Translate and DeepL:

Rare language pairs - quality drops sharply for less common combinations. Even NLLB-200 with 200 languages has thin coverage for many of them
Idioms and cultural context - dedicated translation models tend to translate too literally. LLMs do better but aren’t perfect
Consistency across documents - translating a 50-page document and keeping terminology consistent requires manual review
Speed for bulk - translating thousands of documents is slower locally than API-based services with dedicated infrastructure

For the 55 languages TranslateGemma supports, the quality gap with commercial services is shrinking fast. For everything else, it depends heavily on the specific language pair.