Zhipu AI has released GLM-5, a 744-billion-parameter open-source model that’s raising eyebrows for two reasons: it matches frontier models from OpenAI and Anthropic on key benchmarks, and it was trained entirely on Huawei chips without a single NVIDIA GPU.
The model dropped on February 11, 2026 under an MIT license, making it fully open for commercial use. It’s available now via OpenRouter, the company’s chat.z.ai platform, or you can run it locally if you have the hardware.
What Makes GLM-5 Different
This isn’t just another Chinese LLM announcement. GLM-5 represents several firsts:
Built Without Western Hardware: The entire model was trained on a cluster of 100,000 Huawei Ascend 910B chips using the MindSpore framework. No NVIDIA H100s, no AMD MI300s. Given that Zhipu AI was added to the US Entity List in January 2025, this was both necessity and proof of concept.
Efficient Despite Scale: It uses a Mixture-of-Experts architecture with 256 total experts, but only 44 billion parameters activate per inference (8 experts per token). This keeps it practical to run while maintaining the knowledge of a much larger model.
Record Low Hallucinations: Zhipu developed a new RL training framework called “Slime” that they claim reduced hallucination rates from 90% to 34%, and they’ve open-sourced it on GitHub.
How It Compares
The benchmarks paint an interesting picture:
| Task | GLM-5 | Claude Opus 4.6 | GPT-5.2 |
|---|---|---|---|
| SWE-bench Verified (coding) | 77.8% | 80.8% | 76.2% |
| Humanity’s Last Exam | 50.4% | 46.2% | 47.8% |
| BrowseComp (web search) | 75.9 | 68.4 | 72.1 |
| AIME 2025 (math) | 88.7% | 92.3% | 100% |
GLM-5 leads on research and factual accuracy tasks but trails on agentic coding and pure math. It’s not the best at everything, but it’s competitive across the board while being fully open-source.
The more practical finding: GLM-5 scores -1 on the AA-Omniscience Index, meaning it’s the top-performing model for knowing when to say “I don’t know” rather than making things up.
The Pricing
This is where it gets interesting for actual users:
- Input: ~$1.00 per million tokens
- Output: ~$3.20 per million tokens
Compare that to Claude Opus at $15/$75 or GPT-5.2 at roughly $2.50/$10. For comparable capability, GLM-5 is 5-6x cheaper than the major commercial options.
The free tier at chat.z.ai requires no credit card and includes enough credits to actually try it, which is refreshing compared to the $20/month minimums elsewhere.
Running It Yourself
GLM-5 is on Hugging Face and Ollama. To run it via vLLM:
vllm serve zai-org/GLM-5-FP8 --tensor-parallel-size 8
You’ll need significant GPU resources given the model size, but the MoE architecture means inference is more tractable than the raw parameter count suggests.
What This Means
For users who care about privacy and data sovereignty, GLM-5 opens interesting options. You can self-host a frontier-class model, run it through non-US infrastructure, or simply benefit from the competition driving prices down.
For the AI industry, it’s proof that US export controls on chips aren’t preventing China from building competitive models. Huawei’s Ascend chips reportedly run at 60-80% the efficiency of an H100 for training, which apparently is enough.
Whether you trust a model trained in China on your sensitive data is a personal risk calculation. But the existence of a strong, open-source, MIT-licensed model that rivals Claude and GPT gives everyone more options.
The Bottom Line
GLM-5 is worth trying if you’re:
- Running up large API bills on Claude or GPT
- Looking for an open-source model you can actually self-host
- Doing research tasks where its low hallucination rate matters
- Curious about what 744 billion parameters feels like
It won’t replace Claude for complex coding tasks or GPT-5 for pure math, but for the price point and the license, it’s now a serious contender.