Open Source AI Wins: Google Drops Gemma 4 Under Apache 2.0, Mozilla Builds a Copilot Killer, and a 30-Person Lab Ships a 400B Model

Google shipped Gemma 4 under Apache 2.0 — no custom clauses, no revenue thresholds, no fine print. Mozilla launched an open-source alternative to ChatGPT Enterprise. A 30-person San Francisco lab trained a 400-billion-parameter reasoning model for $20 million and released it for free. And an actress best known for fighting zombies somehow built the most popular AI memory system on GitHub.

Here’s what happened in open-source AI this week.

Gemma 4: Google Finally Ships a Real Open-Source License

Google released Gemma 4 on April 2, and the license matters more than the benchmarks.

Previous Gemma releases came with Google’s custom terms — usage restrictions, acceptable use policies, and enough legal ambiguity to make enterprise lawyers nervous. Gemma 4 ships under Apache 2.0. That’s the same license used by Kubernetes, TensorFlow, and Android. No custom clauses, no enterprise carve-outs, no revenue caps.

The model family includes four variants: E2B (2B), E4B (4B), 26B MoE, and 31B Dense. The 31B dense model is the standout — it scores 89.2% on AIME 2026, hits a Codeforces ELO of 2,150, and ranks #3 on Arena AI with an ELO of 1,452. All models handle vision, video, and text natively across 140+ languages, with context windows up to 256K tokens.

The smallest variants run on phones. The 31B runs on a workstation GPU. And the Apache 2.0 license means you can fine-tune it, sell products built on it, and never owe Google a thing.

For the open-source AI community, the license shift is the headline. When Google — a company that could lock this behind an API and charge per token — chooses genuine open source, it validates the model that Meta, Alibaba, and Zhipu have been pushing. The more major labs compete on openness, the harder it gets for anyone to close the door.

Mozilla Thunderbolt: An Open-Source Enterprise AI Client

Mozilla’s for-profit arm MZLA Technologies announced Thunderbolt on April 16 — an open-source, self-hostable AI client designed to compete with Microsoft Copilot, ChatGPT Enterprise, and Claude Enterprise.

Thunderbolt is a “sovereign AI client” that lets organizations run AI on their own infrastructure. You choose your models, connect to your enterprise data, and keep everything in-house. It ships as a web app plus native builds for Linux, macOS, Windows, iOS, and Android. The backend is built in partnership with deepset, the Berlin-based company behind the Haystack agent framework.

It’s early. Mozilla acknowledges the project is under active development, still undergoing a security audit, and not yet production-ready. But the positioning is smart. Enterprises that want AI assistants but can’t (or won’t) send their data to OpenAI, Microsoft, or Google now have an open-source option from an organization with 25 years of credibility in the privacy-first space.

The timing is good too. With the EU AI Act enforcement ramping up and enterprise data sovereignty becoming a procurement requirement, “we run our own AI” is shifting from nice-to-have to compliance checkbox.

Arcee AI Trinity: A 400B Reasoning Model From a 30-Person Lab

Arcee AI, a 30-person San Francisco startup, released Trinity Large Thinking under Apache 2.0 — a 400-billion-parameter sparse mixture-of-experts model built for autonomous agents and long-horizon reasoning.

The numbers are striking. Trinity activates only 13 billion parameters per token using a 4-of-256 expert routing strategy, so inference costs stay manageable despite the massive total parameter count. It supports a 262,144-token context window and currently ranks #2 on PinchBench, the benchmark for autonomous agent capabilities.

The training run tells you something about where the economics of frontier AI are heading. Arcee spent $20 million on 2,048 NVIDIA B300 Blackwell GPUs for 33 days, training on 20 trillion tokens. That’s serious money for a startup — but it’s a rounding error compared to what OpenAI or Google spend on their frontier models. If a team of 30 can ship a competitive 400B model for $20 million, the barrier to entry for frontier-class open models is lower than the industry narrative suggests.

Apache 2.0 license, no restrictions. Download the weights, modify them, deploy commercially. The kind of openness that Meta’s Llama license still doesn’t quite match.

MemPalace: Milla Jovovich Built the Most Popular AI Memory System on GitHub

In the “sentences I never expected to write” category: actress Milla Jovovich pushed a repository to GitHub on April 5 that crossed 23,000 stars in 48 hours and became the #1 trending repo on the platform.

MemPalace gives large language models persistent, cross-session memory using a structured “memory palace” architecture. Instead of flat vector search, it organizes information hierarchically — people and projects become wings, topics become rooms, and original content lives in drawers. The whole thing runs locally on just two dependencies: ChromaDB and PyYAML.

The technical results are genuine. MemPalace scores 96.6% recall@5 on the LongMemEval benchmark — a 34% improvement over flat vector search (60.9%). No LLM API calls needed for the memory layer itself; classification, chunking, and compression all run on regex heuristics and keyword scoring. That means the memory system adds roughly 170 tokens of overhead per query.

The asterisk: MemPalace initially claimed 100% recall, which turned out to be based on hand-tuned fixes for specific test cases. The team revised the claim to 96.6% after community pushback. That’s still excellent, and the correction speaks well of the project’s responsiveness.

Whether it was built by a Hollywood actress or a career ML engineer, the code works. And the fact that a non-traditional developer created one of the month’s most impactful open-source AI projects is its own kind of win for the community.

The Agent Framework Surge

April has been unusually dense for open-source agent tooling. Three frameworks worth tracking:

Google’s Agent Development Kit (ADK) hit 8,200+ stars within weeks of launch. It’s a code-first Python framework for building multi-agent systems, released under Apache 2.0. While optimized for Gemini, it’s model-agnostic and works with any LLM. TypeScript, Go, and Java versions are also available.

Block’s Goose reached 4,900+ stars with a local-first approach — native MCP support, runs on your machine, designed for developers who want agent capabilities without cloud dependencies.

llama.cpp pushed four tagged releases in a single week, with significant Vulkan flash attention improvements for AMD GPUs. Benchmarks show llama.cpp b8765 hitting 52-56 tokens/second on hardware where Ollama manages 34 t/s — a gap that matters for anyone running models locally on AMD hardware.

The pattern: agent frameworks are converging on MCP as the standard protocol, Apache 2.0 as the standard license, and local-first as a first-class deployment target. The infrastructure for running sophisticated AI agents on your own hardware is maturing fast.

What This Means

Two weeks ago, this column noted that open-weight models were claiming benchmark leads. That’s still happening — but this week’s story is bigger than benchmarks.

Google moved Gemma to Apache 2.0. Mozilla built an open-source enterprise AI client. A 30-person startup shipped a 400B model under a fully permissive license. The agent framework ecosystem is consolidating around open standards. The infrastructure layer — from model weights to memory systems to orchestration tools — is going open at every level of the stack.

The companies selling proprietary AI aren’t just competing against open-source models anymore. They’re competing against open-source everything: models, agents, memory, deployment, and now enterprise clients. Each layer that goes open makes the next layer harder to keep closed.

For anyone building with AI, the practical takeaway is simple: check what’s available in open source before reaching for an API key. The answer might surprise you.