Open-source AI keeps eating the frontier. This week, Alibaba proved bigger isn’t always better by releasing a model that outperforms its own trillion-parameter predecessor at a fraction of the cost. Meanwhile, Mistral dropped a 675B model under Apache 2.0, and local inference tools continue their explosive growth. Here’s what happened.
Qwen3.5: The Model That Beat Itself
On February 15, Alibaba released Qwen3.5-397B-A17B - and promptly embarrassed its own previous flagship.
The numbers tell the story: 397 billion total parameters, but only 17 billion active per token thanks to a mixture-of-experts architecture. Compare that to Qwen3-Max, Alibaba’s trillion-parameter behemoth released just weeks earlier.
The result? Qwen3.5 beats Qwen3-Max on most benchmarks while costing 60% less to run and achieving 8.6x to 19x higher decoding throughput.
What else ships with it:
- 1 million token context window - enough for entire codebases or multi-day conversation histories
- 201 languages and dialects - up from 82 in the previous generation
- Native vision capabilities - can process images and video alongside text
- Agent-first design - optimized for autonomous task execution
The model is available under Apache 2.0, which means full commercial use with no restrictions. Weights are on Hugging Face and ModelScope. Alibaba Cloud offers API access, but you can also run it yourself if you have the hardware.
For context: a trillion-parameter model typically needs a cluster of high-end GPUs. A 397B model with 17B active parameters is much more tractable, though still substantial. The efficiency gains matter because they translate directly into cost savings and deployment flexibility.
Mistral Large 3: Frontier Goes Apache 2.0
Mistral announced Mistral 3, including Mistral Large 3 - a 675B parameter model released under Apache 2.0.
The architecture: mixture-of-experts with 41B active parameters out of 675B total. Trained from scratch on 3,000 NVIDIA H200 GPUs. Context window of 256K tokens.
Key capabilities:
- Multimodal by default - processes text, images, and documents natively
- Code generation - trained extensively on programming tasks
- Agent support - function calling and tool use built-in
Mistral claims the model delivers 92% of GPT-5.2’s performance at roughly 15% of the price. Whether those claims hold up across all use cases remains to be tested, but having another serious open-weight frontier model benefits the ecosystem.
The Ministral 3 series also shipped: smaller models at 3B, 8B, and 14B parameters designed for edge deployment. These can run on single GPUs, making them practical for robotics, drones, and on-device applications.
Everything is on Hugging Face. Cloud deployments available through Amazon Bedrock, IBM watsonx, and Mistral’s own platform.
Local Inference Tools Keep Growing
The GitHub numbers show where developer attention is going:
- Ollama crossed 162,000 stars - still the simplest way to run LLMs locally
- Dify hit 130,000 stars - no-code AI workflow builder
- OpenClaw reached 188,000 stars (though security concerns have tempered enthusiasm)
What’s driving adoption? Practical factors:
Cost. Cloud inference adds up fast for production workloads. Local inference means paying once for hardware instead of continuously for API calls.
Privacy. Healthcare, legal, financial - sectors where data can’t leave your infrastructure. Local models are often the only compliant option.
Latency. Network round-trips add latency that matters for real-time applications. Local inference eliminates that entirely.
Control. No API deprecations, rate limits, or surprise policy changes. Your model keeps working regardless of what happens to the provider.
The ecosystem has matured considerably. A year ago, running local models required substantial technical knowledge. Now Ollama handles everything with ollama run llama3.2. The barrier to entry keeps dropping.
Also Worth Noting
ClawHub’s skill registry hit 5,700 community-built integrations - calendar management, code review, research automation, and more. The plugin ecosystem around AI agents is now substantial.
GLM-5 adoption continues expanding after last week’s MIT-licensed release. The 744B model running on Huawei Ascend chips demonstrates viable alternatives to NVIDIA hardware for frontier training.
Browser Use, the framework for AI browser automation, maintains steady growth. Agents that can navigate web applications are becoming standard toolkit items.
What This Means
Three trends are converging:
Efficiency beats scale. Qwen3.5 proving that a smaller, smarter model can beat a larger one isn’t surprising - it’s what the research predicted. But seeing it demonstrated at the trillion-parameter scale by the same company matters. The days of “just add more parameters” are over.
Open-weight frontier is real. Between GLM-5 (744B, MIT), Qwen3.5 (397B, Apache 2.0), and Mistral Large 3 (675B, Apache 2.0), there are now multiple permissively-licensed models competing at frontier performance levels. You don’t need proprietary APIs to access cutting-edge capabilities anymore.
Local is going mainstream. Ollama at 162K stars isn’t a niche project - it’s becoming standard developer infrastructure. The audience for local AI has expanded far beyond privacy advocates and enthusiasts.
For builders: the calculus has shifted. The question isn’t whether open models are good enough. It’s which open model fits your specific requirements, and whether you want to run it yourself or pay for hosted inference.
The frontier keeps moving, and open-source is keeping pace.