Something unprecedented happened in the open-weight AI world over the past month: five frontier-class models shipped in roughly 30 days. Not incremental upgrades. Not fine-tunes of existing checkpoints. Full frontier models with new architectures, released under permissive licenses, performing within striking distance of the best closed models from OpenAI and Anthropic.
Here’s what dropped, what matters, and what it means for anyone running models locally or building on open infrastructure.
DeepSeek V4: The One That Matches the Closed Frontier
DeepSeek V4-Pro landed in late April with 1.6 trillion total parameters, 49 billion active per token, and a million-token context window. It’s released under the MIT license — no restrictions on commercial use or fine-tuning.
The numbers that matter: V4-Pro scores alongside GPT-5.5 and Claude Opus 4.7 on agentic benchmarks. Those are closed, proprietary models that cost several dollars per million output tokens. DeepSeek V4-Pro is self-hostable and available via API at a fraction of that cost.
The efficiency gains are just as significant. A hybrid attention mechanism combining Compressed Sparse Attention and Heavily Compressed Attention means V4-Pro needs only 27% of the single-token inference FLOPs and 10% of the KV cache compared to DeepSeek V3.2. For anyone running inference at scale, that’s a direct hit to the operating budget.
There’s also V4-Flash at 284 billion total parameters (13 billion active), targeting the fast-and-cheap tier with the same million-token context.
License: MIT | Source: DeepSeek, MindStudio
Cohere Command A+: Enterprise-Grade Under Apache 2.0
Cohere released Command A+ on May 20 — a 218 billion parameter MoE model with 25 billion active, under a full Apache 2.0 license. That’s genuinely permissive: no usage restrictions, no reporting requirements, no commercial limitations.
The hardware story is the headline. Command A+ runs on two H100s or a single Blackwell GPU with what Cohere calls “virtually no quality degradation” across BF16, FP8, and W4A4 quantizations. If lossless quantization at 4-bit actually holds up in production — and early reports suggest it does — this changes the economics of self-hosted enterprise AI.
Command A+ also ships with native citation generation. When the model pulls from an external source, it generates explicit grounding spans linking claims to specific documents. For regulated industries where audit trails matter, this is a feature that closed models don’t consistently offer.
License: Apache 2.0 | Source: Cohere, VentureBeat
ZAYA1-8B: Sub-Billion Active Parameters, Frontier Reasoning
This one is easy to overlook because of the model name, but ZAYA1-8B from Zyphra might be the most interesting architectural story of the month.
It’s a MoE with 8.4 billion total parameters and roughly 760 million active per token. Under one billion active parameters. And it matches or exceeds Mistral-Small-4-119B — a model with 119 billion parameters — on reasoning, math, and coding benchmarks. It stays competitive with first-generation frontier reasoning models like DeepSeek-R1 and early Gemini 2.5 Pro.
The training infrastructure is notable too: ZAYA1-8B was built entirely on AMD Instinct MI300X clusters with AMD Pensando Pollara networking. No NVIDIA hardware at all. Combined with Zhipu AI training GLM-4.7 entirely on Huawei Ascend silicon, the NVIDIA training monopoly is showing cracks.
Architectural innovations include Compressed Convolutional Attention (more efficient than standard attention), an MLP-based expert router for better routing stability, and learned residual scaling. All released under Apache 2.0.
License: Apache 2.0 | Source: HPCwire, MarkTechPost
NVIDIA Nemotron 3: The Hardware Company Goes Model-First
NVIDIA’s Nemotron 3 family — Nano, Super, and Ultra — uses a hybrid Mamba-Transformer MoE architecture, which is a mouthful but translates to 4x higher throughput than the previous generation while supporting up to a million-token context.
Nemotron 3 Super has already claimed the top spot on the EnterpriseOps-Gym leaderboard, beating both DeepSeek and GPT-OSS. Nano Omni is a multimodal model unifying vision, audio, and text in a single system — useful for anyone building agents that need to see, hear, and read.
The strategic angle matters more than the benchmarks. NVIDIA releasing competitive open models creates a flywheel: better open models drive more GPU demand for inference. It also gives enterprises a model they can run on NVIDIA hardware without licensing concerns, which is a direct play against closed API providers.
Source: NVIDIA Newsroom, WCCFTech
The Pattern: Sparse MoE Everywhere
Look at the architectures across these releases:
| Model | Total Params | Active Params | License |
|---|---|---|---|
| DeepSeek V4-Pro | 1.6T | 49B | MIT |
| Cohere Command A+ | 218B | 25B | Apache 2.0 |
| ZAYA1-8B | 8.4B | 760M | Apache 2.0 |
| Llama 4 Maverick | 400B | 17B | Llama |
| Qwen 3.5 | 397B | 17B | Apache 2.0 |
| Mistral Large 3 | 675B | 41B | Apache 2.0 |
Every major open-weight model shipping right now is a sparse Mixture-of-Experts. The total parameter count is marketing; the active parameter count determines your hardware bill. A 1.6 trillion parameter model that only activates 49 billion per token can run on hardware that a 200 billion dense model can’t.
This convergence on MoE isn’t accidental. It’s the only architecture that lets open models match frontier closed models on quality while remaining deployable on available hardware. Dense models at this scale would need cluster-scale inference — MoE makes single-node or small-cluster deployment viable.
What This Means
Three things are happening simultaneously:
The quality gap is gone for most tasks. DeepSeek V4-Pro matches GPT-5.5 and Claude Opus 4.7 on agentic benchmarks. For the majority of production workloads, the difference between the best open and closed models is now a rounding error.
The hardware monoculture is cracking. ZAYA1-8B on AMD, GLM-4.7 on Huawei Ascend, and now NVIDIA releasing its own models — the assumption that competitive training requires NVIDIA A100s or H100s is no longer true.
Permissive licensing won. MIT, Apache 2.0, and similar licenses dominate this wave. The era of “open-weight but you can’t compete with us” licensing (looking at you, early Llama) is fading. Cohere going full Apache 2.0 with Command A+ is particularly significant — a venture-backed company betting that open distribution beats closed control.
For anyone self-hosting: you have never had better options. The combination of MoE efficiency, aggressive quantization (Command A+ at 4-bit with no quality loss), and million-token contexts means frontier-quality AI is deployable on a pair of H100s or a single Blackwell. Two years ago that required a full data center.
The closed model providers still have advantages in training data, RLHF polish, and product integration. But the raw capability gap? It closed this month.