The past week delivered some of the most significant open-source AI news in months. The creator of llama.cpp found a permanent home, China dropped a 744-billion-parameter model under MIT license, and a secure WhatsApp AI assistant exploded on GitHub. Here’s what matters.
llama.cpp and ggml Join Hugging Face
On February 20, Georgi Gerganov - the creator of ggml and llama.cpp - announced that his team was joining Hugging Face. This is a big deal for anyone who runs AI models locally.
For the uninitiated: llama.cpp is the project that made running large language models on consumer hardware practical. Before it existed, you needed expensive GPUs and complex setups. Now you can run sophisticated models on a MacBook or even a phone. The ggml tensor library that powers it has become foundational infrastructure for local AI.
The key commitments in the announcement:
- llama.cpp and ggml stay MIT licensed - no license changes
- Georgi and team keep full autonomy - they make all technical decisions
- 100% of their time stays on maintenance - not getting pulled into other projects
What changes? Better integration. The teams will work on making it “almost single-click” to ship new models from Hugging Face’s Transformers library to llama.cpp. That means when a new open model drops, you won’t wait weeks for llama.cpp support.
The bigger picture: llama.cpp now has institutional backing without sacrificing independence. Hugging Face gets the most important local inference project on the planet. Users get faster model support and better tooling.
GLM-5: 744 Billion Parameters Under MIT License
On February 11, Chinese AI lab Z.ai (formerly Zhipu AI) released GLM-5 - a 744-billion-parameter model under the MIT license.
The numbers are attention-grabbing: 744B total parameters with 40B active (it’s a mixture-of-experts architecture), trained on 28.5 trillion tokens. It claims to match Claude Opus 4.5 and GPT-5.2 on coding and agentic benchmarks.
But here’s what makes it genuinely interesting:
Trained entirely on Huawei Ascend chips. No NVIDIA hardware was used. This matters because it demonstrates that the chip shortage and export controls haven’t stopped Chinese AI development - they’ve just forced alternative paths.
True MIT license. Not Apache with weird clauses. Not “open weights but restricted use.” MIT means you can use it commercially, modify it, and distribute it however you want.
Pricing that undercuts everyone. API access runs about $0.80 per million input tokens and $2.56 per million output tokens - roughly six times cheaper than comparable proprietary models.
The weights are available on Hugging Face and ModelScope. Whether the benchmark claims hold up in practice remains to be seen, but having another serious open-source contender in the 700B+ parameter range benefits everyone.
NanoClaw: Secure AI Agents Hit 14K Stars
Sometimes the most useful open-source projects aren’t foundation models - they’re tools that make existing models practical. NanoClaw crossed 13.9K GitHub stars this week as an alternative to OpenClaw that puts security first.
What it does: runs Claude-based AI agents in isolated containers that connect to WhatsApp, Telegram, Discord, and other messaging platforms. Each conversation gets its own sandboxed environment with separate memory and filesystem.
Why it matters: running AI agents that can execute code is inherently dangerous. Most approaches either give the agent full system access (bad) or cripple its capabilities (defeats the purpose). NanoClaw uses OS-level container isolation - agents run in their own Linux containers and can only access directories explicitly mounted for them.
The entire codebase is about 3,900 lines of code across 15 files. You can read the whole thing in an afternoon. It’s built on Anthropic’s Claude Agent SDK and uses Apple Container on macOS or Docker on Linux.
For group chats, each group gets its own CLAUDE.md memory file and isolated filesystem. Scheduled tasks work out of the box. It’s MIT licensed, and you only pay for Anthropic API usage.
Also Worth Noting
Transformers v5 launched - Hugging Face’s first major library release in five years. The ecosystem has grown from 40 model architectures to over 400, with 750,000+ model checkpoints on the Hub. PyTorch is now the primary framework, with TensorFlow and Flax support being sunset.
Natively emerged as an open-source alternative to Cluely for meeting assistance. It supports local models through Ollama, keeps all transcripts and embeddings local, and works with Google Meet, Zoom, and Teams. No backend, no telemetry.
NVIDIA’s Alpamayo models for autonomous vehicles went open-source, available on Hugging Face. Includes the first open chain-of-thought reasoning model for AV research plus 1,700 hours of driving data.
What This Means
The theme this week: infrastructure maturation.
Open-source AI is moving past the “impressive demo” phase into sustainable projects with long-term homes. ggml joining Hugging Face isn’t about acquisition - it’s about ensuring llama.cpp survives regardless of what happens to any individual contributor. GLM-5 proves that open models can compete at the frontier level. NanoClaw shows that practical tooling around agents is catching up to the agents themselves.
The gap between what you can run locally versus what you need to pay for keeps shrinking. A year ago, running a competitive LLM on your laptop was impressive. Now it’s table stakes. The interesting questions have shifted to: What can you build with these tools? How do you make them safe? How do you integrate them into workflows?
For anyone building on open-source AI: the foundations are more solid than they’ve ever been.