MiniMax M2.5: The Open-Weight Model Matching Claude Opus at 1/20th the Cost

MiniMax just released an open-weight model that matches Claude Opus 4.6 on coding benchmarks while costing 5% as much to run. The model ships under an MIT license, meaning you can download it and run it yourself. There’s just one problem: Anthropic says MiniMax built it by stealing from Claude.

The Numbers

MiniMax M2.5 scores 80.2% on SWE-Bench Verified, placing it within 0.6 percentage points of Claude Opus 4.6’s 80.8%. It beats GPT-5.2 (80%) and pulls ahead of Gemini 3 Pro (78%).

On Multi-SWE-Bench, which tests code generation across multiple files, M2.5 actually beats Claude Opus with 51.3% versus 50.3%.

Tool calling is where things get interesting. The Berkeley Function Calling Leaderboard (BFCL) multi-turn benchmark shows M2.5 at 76.8%, with Claude Opus trailing at 63.3%. For agentic workflows that need to chain tool calls together, that gap matters.

Speed matches the capability. M2.5 completes SWE-Bench tasks 37% faster than its predecessor M2.1, hitting the same completion times as Claude Opus 4.6.

Architecture and Pricing

M2.5 uses a mixture-of-experts architecture: 230 billion total parameters with only 10 billion active per forward pass. This keeps inference costs low while maintaining quality.

API pricing runs $0.30 per million input tokens and $1.20 per million output tokens. For comparison, Claude Opus 4.6 costs $15 per million input tokens. That’s 50x cheaper on inputs.

The model ships in two variants:

M2.5 Standard: 50 tokens/second throughput
M2.5 Lightning: 100 tokens/second at double the output cost ($2.40/million)

Both are trained using MiniMax’s Forge reinforcement learning framework, which scales training across 200,000+ real-world environments including code repositories, browsers, and office applications.

Self-Hosting Requirements

You can run M2.5 locally, but you’ll need serious hardware.

The minimum setup requires 96GB of VRAM using aggressive 2-bit quantization. A tri-GPU RTX 3090 cluster works, as does a Mac Studio M4 Max with 96GB unified memory.

For better quality with 3-bit dynamic quantization, plan on 128GB. Unsloth’s dynamic GGUF preserves critical attention layers at higher precision. On a 128GB Mac, expect around 20 tokens per second.

The full 200k context window adds another 40-60GB for the KV cache alone, on top of the model weights.

Ollama lists minimax-m2.5, though local support appears limited to quantized GGUFs via llama.cpp rather than native Ollama architecture support.

The Distillation Controversy

Here’s where it gets complicated.

On February 24, Anthropic accused MiniMax, along with DeepSeek and Moonshot AI, of running “industrial-scale distillation campaigns” against Claude. The allegation: these companies created 24,000 fraudulent accounts and generated over 16 million exchanges with Claude to extract its capabilities.

MiniMax allegedly drove the most traffic, accounting for over 13 million of those exchanges.

Distillation itself is a standard technique. Frontier labs routinely distill their own models to create smaller versions. But using it to copy a competitor’s capabilities without permission is a different matter.

According to Anthropic, MiniMax used commercial proxy services to bypass service restrictions that prevent commercial Claude access in China. When Anthropic launched new Claude models, MiniMax allegedly redirected nearly half its traffic to target the fresh capabilities.

Each campaign specifically went after Claude’s most differentiated features: agentic reasoning, tool use, and coding. Exactly the areas where M2.5 now excels.

MiniMax has not publicly responded to the accusations.

What This Means for Self-Hosters

The ethics here are murky. If Anthropic’s allegations are accurate, M2.5’s capabilities were essentially stolen. Using a model trained on illicitly obtained data raises questions, even if you’re running it locally on your own hardware.

But the technical reality is unchanged: M2.5 exists, it’s MIT-licensed, and it performs. For organizations that need local AI for privacy or compliance reasons, the model offers genuine frontier capability without cloud dependencies.

The licensing appears clean from a legal standpoint. MIT allows commercial use. Whether that licensing survives potential litigation is another question.

The Bigger Picture

MiniMax M2.5 represents a structural shift. The gap between open-weight and closed models has compressed from months to effectively zero. A model you can download and run locally now matches the best proprietary offerings on key benchmarks.

The distillation controversy complicates the narrative. China’s rapid AI progress isn’t just about engineering talent and compute access. According to U.S. companies, it also involves systematic extraction of Western model capabilities.

For now, M2.5 sits in an uncomfortable position: technically impressive, practically useful, ethically questionable.

The Bottom Line

MiniMax M2.5 matches Claude Opus 4.6 on coding tasks at a fraction of the cost, but Anthropic’s distillation accusations cast a shadow over how it got there.