The gap between proprietary and open-source AI models has been closing steadily. With MiniMax’s M2.5 release in February 2026, that gap has effectively disappeared - at least for coding and agentic tasks.
The Chinese AI startup’s latest model ranks fourth on the OpenHands Index, behind only Claude Opus and GPT-5.2 Codex. More importantly, it’s the first open-weights model to exceed Claude Sonnet across software engineering benchmarks.
The price difference is staggering: M2.5 costs roughly one-twentieth what you’d pay for Claude Opus.
The Numbers
M2.5 achieves 80.2% on SWE-Bench Verified, 51.3% on Multi-SWE-Bench, and 76.3% on BrowseComp. For office productivity tasks (GDPval-MM), it scores a 59.0% average win rate against competing models.
According to OpenHands testing, the model excels at greenfield app development, issue resolution, and software testing. Their assessment: “basically a two-horse race between Claude Opus on the most capable but pricy side, and M2.5 on the very inexpensive and still highly capable side.”
The cost breakdown tells the story. M2.5-Lightning runs at 100 tokens per second for $0.30 per million input tokens and $2.40 per million output tokens. Running it continuously at full speed costs approximately $1 per hour. For comparison, Claude Opus costs roughly $3.00 per task to M2.5’s $0.15.
How They Did It
The secret is Mixture-of-Experts architecture. M2.5 has 230 billion total parameters but only activates 10 billion at any given time, dramatically reducing compute requirements while maintaining capability.
MiniMax’s technical documentation describes training on 200,000+ real-world environments using reinforcement learning. The model covers 10+ programming languages across full-stack development scenarios: Python, TypeScript, Go, Rust, Java, C++, and others.
One notable capability: “architectural thinking” - the model decomposes projects and plans structure before writing code, rather than generating line by line.
M2.5 completes agentic tasks in approximately 20% fewer rounds than its predecessor M2.1, and finishes SWE-Bench Verified evaluations 37% faster.
Running It Locally
Unlike proprietary models, you can actually run M2.5 yourself. The full weights are available on Hugging Face, and the model is already in Ollama’s library.
For local deployment, Unsloth provides quantized GGUF versions. The 3-bit quantization (UD-Q3_K_XL) weighs in at 101GB and runs at roughly 20+ tokens per second on a 128GB unified memory Mac.
MiniMax recommends vLLM or SGLang for production deployment to achieve optimal performance.
Who Is MiniMax?
MiniMax was founded in December 2021 by former SenseTime researchers in Shanghai. CEO Yan Junjie previously served as SenseTime’s youngest vice president.
The company raised $850 million across four private funding rounds from investors including Alibaba, Tencent, and miHoYo (the studio behind Genshin Impact). In January 2026, they went public in Hong Kong, raising $619 million in an oversubscribed IPO at a share price of HK$165 ($21.18).
The company reports that internally, 30% of all tasks are completed autonomously by M2.5, and 80% of their newly committed code is generated by the model.
What This Means
For developers paying Claude or OpenAI API costs, M2.5 opens new economics. Tasks that weren’t cost-effective before suddenly become viable when you’re paying 5% of the previous price.
For the self-hosted crowd, this is the strongest open-weights coding model available. The MoE architecture makes it more practical to run locally than a dense model of equivalent capability would be.
For the AI industry, it’s another data point in the “open models catch up faster than expected” trend. The frontier moves quickly, but the open-source community is proving it can keep pace.
The Bottom Line
MiniMax M2.5 delivers Claude-class coding capability at commodity pricing, with fully open weights. If you’re running AI workloads at scale or want to host models yourself, it’s now the obvious choice for the price-performance ratio.