The AI industry’s biggest labs have spent billions training models that excel at English. Meanwhile, the 4.5 billion people who speak Bengali, Tamil, Urdu, Swahili, or any of dozens of other languages have been stuck with tools that barely understand them.
Cohere just released Tiny Aya, a family of open-weight models supporting over 70 languages that runs locally on consumer hardware - including phones - without requiring an internet connection. The announcement came at the India AI Impact Summit, and the timing wasn’t coincidental.
What Tiny Aya Actually Does
The base model has 3.35 billion parameters. That’s small enough to run on a laptop or phone, yet it outperforms larger multilingual models on languages that tech giants have historically ignored.
On the WMT24++ translation benchmark, Tiny Aya Global outperforms Gemma3-4B in 46 of 61 languages. On mathematical reasoning tasks for African languages (GlobalMGSM benchmark), Tiny Aya hit 39.2% accuracy - crushing Gemma3-4B at 17.6% and Qwen3-4B at a dismal 6.25%.
The difference is stark: models trained primarily on English and Chinese data fall apart when asked to reason in Yoruba or solve math problems in Swahili.
Regional Variants for Local Needs
Cohere isn’t shipping a single one-size-fits-all model. The release includes five versions:
- TinyAya-Base: The pretrained foundation
- TinyAya-Global: Instruction-tuned for balanced performance across 67 languages
- TinyAya-Earth: Optimized for African and West Asian languages
- TinyAya-Fire: Focused on South Asian languages including Bengali, Hindi, Punjabi, Urdu, Gujarati, Tamil, Telugu, and Marathi
- TinyAya-Water: Covers Asia Pacific and European languages
The regional variants aren’t just marketing. Different language families have different grammatical structures, scripts, and tokenization challenges. A model fine-tuned for South Asian languages will handle Bengali’s complex verb conjugations better than a generic multilingual model.
Privacy by Design
Here’s what matters for anyone concerned about data sovereignty: Tiny Aya runs entirely offline on local devices. Your data never leaves your hardware. No API calls to a company’s servers. No conversations logged and stored indefinitely. No uncertainty about how your inputs might be used for training future models.
For users in regions with limited connectivity, this is practical necessity. For users anywhere who want AI assistance without the surveillance tradeoffs, it’s a feature most cloud services can’t match.
The models are available on HuggingFace, Kaggle, and Ollama. If you’re running Ollama, deploying is straightforward.
How They Built It
Cohere’s training approach tackled three problems that plague multilingual AI:
Tokenization: Most tokenizers were built for English. They chop non-Latin scripts into far more tokens than necessary, making inference slower and context windows effectively shorter for non-English languages. Cohere reports Tiny Aya achieves “the most efficient tokenization across the vast majority of evaluated languages”.
Synthetic data naturalization: AI-generated training data often sounds stilted. Cohere developed techniques to make synthetic examples sound natural in each target language rather than like machine-translated English.
Targeted merging: Rather than training a single model on everything at once, they used merging strategies to preserve linguistic nuances that get washed out in standard multilingual training.
The entire post-training process ran on a single cluster of 64 H100 GPUs. That’s modest compute for a model covering 70+ languages. It suggests efficient training methods that other labs - or well-resourced research groups - could replicate.
Why This Matters Beyond India
Cohere announced Tiny Aya at an Indian AI summit, and South Asian languages get prominent billing. But the implications extend further.
Most AI development concentrates in the US and China, optimizing for languages spoken in those markets. The EU AI Act creates regulatory pressure but limited language diversity. Africa, Latin America, Southeast Asia, and South Asia collectively have billions of potential AI users who’ve been afterthoughts in model development.
Tiny Aya represents something different: a serious effort at multilingual AI that works offline, respects user privacy, and doesn’t require users to send their data to Silicon Valley servers.
Is a 3.35B model as capable as GPT-4 or Claude? Of course not. But for translation, basic reasoning, summarization, and assistance tasks in languages that frontier models handle poorly, a smaller model optimized for those languages may actually perform better.
What You Can Do
If you need multilingual AI that stays local:
- Install Ollama if you haven’t already
- Pull the Tiny Aya variant that matches your languages
- Run it without sending data anywhere
If you work with underserved languages: Cohere is releasing their multilingual fine-tuning datasets and evaluation benchmarks for community use. If you’re building language tools, these resources help.
If you’re evaluating AI options for regions with limited connectivity: The offline-first design matters more than benchmark scores when your users can’t reliably reach cloud servers.
The Business Context
Cohere hit $240 million in annual recurring revenue by end of 2025 with 50% quarter-over-quarter growth. CEO Aidan Gomez has indicated they plan to go public soon. Releasing open-weight models while building toward an IPO is an interesting strategy - it suggests Cohere sees value in ecosystem building rather than model licensing alone.
The Tiny Aya release also positions Cohere for the enterprise market in India and other Global South countries. Giving away the small models creates goodwill and adoption. Larger deployments and fine-tuning services presumably come with enterprise pricing.
Bottom Line
Tiny Aya won’t replace GPT-4 for English speakers. It’s not trying to. Instead, it’s a capable multilingual model that runs offline on consumer hardware, handles languages that frontier models neglect, and keeps your data on your device.
For the billions of people whose languages have been undertreated by AI development, that’s more useful than another marginal improvement on English benchmarks.