Open Source AI Wins: DeepSeek V4 Narrows the Gap, Apache 2.0 Becomes the Default, and Ollama Hits 52 Million Downloads

The gap between open-weight and proprietary AI models has been narrowing for months. In April 2026, it got harder to see. DeepSeek’s V4 Pro, MIT Technology Review says, falls only “marginally short” of frontier closed-source models — a distance that was measured in double digits a year ago. Six months ago, open-weight advocates were still arguing about whether open models could ever catch up. That argument is winding down fast.

DeepSeek V4: The New Open-Source Ceiling

DeepSeek released V4 on April 24 in two variants: V4-Pro (1.6 trillion parameters) and V4-Flash (284 billion parameters). Both feature a 1 million token context window and a new “Hybrid Attention Architecture” designed to maintain coherence across long conversations.

The benchmarks are striking. V4-Pro approaches Claude Opus 4.6, GPT-5.4, and Gemini 3.1 on major evaluations, falling marginally short on some tasks while matching or exceeding them on others. It exceeds every other open-weight model — Qwen 3.5, GLM-5.1, all of them — on coding, math, and STEM tasks.

Then there’s the pricing. V4-Flash costs $0.14 per million input tokens and $0.28 per million output tokens. That’s one-sixth the cost of GPT-5.5. MIT Technology Review called it “the clearest signal yet” that frontier AI doesn’t have to be expensive.

The model is open-weight, which means you can download it, run it locally (if you have the hardware), fine-tune it, and deploy it without asking permission or paying royalties. For organizations that have been building on proprietary APIs and worrying about vendor lock-in, V4 is the strongest argument yet for switching.

Qwen Keeps Shipping

While DeepSeek grabbed headlines, Alibaba’s Qwen team has been releasing models at a pace that’s hard to track.

Qwen3.6-27B dropped on April 21 and outperformed its own previous flagship — the 397B MoE model — across every major coding benchmark. A 27 billion parameter model beating a 397 billion parameter model matters because it runs on a single consumer GPU. That’s the difference between a $30,000 server and a machine most developers already own.

Qwen3.6-Max-Preview, released a day earlier, posted the highest scores on six major coding and agent benchmarks including SWE-bench Pro and Terminal-Bench 2.0. Its MoE architecture activates just 3 billion of its 35 billion total parameters per inference call — so it runs fast while punching well above its weight class.

All under Apache 2.0. No restrictions, no revenue caps, no approval process.

Apache 2.0 Becomes the Default

Speaking of licensing: something shifted in April. The industry’s biggest AI companies are converging on Apache 2.0 as the standard open-source license, and the holdouts are disappearing fast.

Google released all four Gemma 4 variants under Apache 2.0 — a first for the Gemma family, which previously had more restrictive terms. Google’s blog post was explicit about why: “By applying the industry-standard Apache 2.0 license terms, Google is providing clarity about developers’ rights and responsibilities so that they can build freely and confidently from the ground up.”

Mistral followed the same path. Every major 2026 release — Large 3, Small 4, Codestral 2, Voxtral TTS — ships under Apache 2.0. Previous Mistral code models had commercial restrictions. Those are gone.

Alibaba’s Qwen family: Apache 2.0. Zhipu’s GLM-5.1: MIT (even more permissive).

The trend matters because licensing was one of the last real barriers to enterprise adoption of open-weight models. Executives who wouldn’t approve deploying a model with ambiguous terms will approve Apache 2.0 — it’s the same license that governs Kubernetes, TensorFlow, and most of the software their companies already run. The legal team already knows what it means.

Ollama Hits 52 Million Monthly Downloads

The local AI movement hit a milestone that deserves attention: Ollama now has 52 million monthly downloads as of Q1 2026. That’s up from 100,000 in Q1 2023 — a 520x increase in three years.

Ollama’s GitHub repository has 169,000 stars. Its model library hosts 135,000+ GGUF-optimized models. Users have pulled models more than 2.5 billion times. All completely free, MIT-licensed, and designed so that no data ever leaves your machine.

The most popular models tell their own story: Llama 3.x leads overall adoption, Qwen 2.5 is the fastest-growing and best for coding, Mistral wins on efficiency, Gemma 3 on image understanding, and DeepSeek-R1 on reasoning. These are all open-weight models running locally on consumer hardware. No API keys, no subscriptions, no data sent to anyone.

When 52 million people are downloading a tool to run AI locally every month, “local AI” stops being a niche interest and starts being a market.

Hugging Face’s ml-intern: AI That Trains AI

Hugging Face released ml-intern on April 21, an open-source AI agent that automates the LLM post-training workflow. Built on the smolagents framework, it autonomously handles literature review, dataset discovery, training script execution, and iterative evaluation.

In testing, ml-intern took Qwen3-1.7B from a 10% baseline on GPQA (a graduate-level science benchmark) to 32% in under 10 hours. The tool runs the full loop — identifying relevant training data, writing and executing training scripts, evaluating results, and iterating — without human intervention.

This matters because post-training (fine-tuning, RLHF, evaluation) has been one of the most labor-intensive parts of working with open models. Making it automatic and open-source removes another barrier between downloading a model and having one that actually works for your specific use case.

The Broader Picture

Hugging Face’s spring 2026 report shows the platform’s user, model, and dataset repositories nearly doubled over the past year. More than 30% of Fortune 500 companies now have verified Hugging Face accounts. Sub-communities in robotics and scientific computing are growing.

The numbers from Fazm’s April tracker are just as stark: seven major open-source model releases in the first 12 days of April alone. Projects now ship with quantized weights, working inference code, and interactive demos from day one. The era of “here’s a paper, maybe we’ll release weights eventually” is over.

And the leaderboards tell the biggest story of all. Five of the top six models on BenchLM.ai’s open-weight rankings come from Chinese labs — DeepSeek, Moonshot, Zhipu, and Alibaba. The competitive pressure from Chinese open-source releases is pushing Western companies to release more, release faster, and release under more permissive licenses.

What This Means

If you’re still evaluating whether open-weight models are “ready” for production use, the window for waiting is closing fast. The best open models approach proprietary performance on most benchmarks, cost a fraction to run, can be deployed anywhere, and come with licenses that legal departments already understand.

The real question is shifting from whether open-source AI can compete to whether proprietary vendors can justify the premium. When DeepSeek V4-Flash delivers near-frontier performance at $0.14 per million tokens and you can run Qwen3.6-27B on your own hardware for free, the burden of proof is moving.

The open-source AI ecosystem narrowed the gap dramatically in April 2026. A remaining margin exists on the hardest tasks, but for most production workloads, it may no longer matter.