In July 2024, Mark Zuckerberg published a 2,000-word manifesto arguing that open-source AI “represents the world’s best shot at harnessing this technology to create the greatest economic opportunity and security for everyone.” It was a convincing pitch. Llama 2 and Llama 3 became the foundation for thousands of research projects, startups, and community tools. The r/LocalLLaMA subreddit treated Meta like a folk hero.
This week, Meta released Muse Spark — its first model from the newly created Meta Superintelligence Labs. It’s closed-source. The weights aren’t available. There’s no download. You need a Facebook or Instagram login to use it. The company that made open-source AI mainstream just quietly switched sides.
What Muse Spark Actually Is
Muse Spark is a natively multimodal reasoning model built from the ground up over nine months by Meta Superintelligence Labs, the division Meta created after spending $14.3 billion to acquire a 49% stake in Scale AI and hiring its CEO, Alexandr Wang, as Meta’s first-ever chief AI officer.
The model supports visual chain of thought, tool use, and multi-agent orchestration. It has a “Contemplating” reasoning mode that runs sub-agents in parallel — a design choice that lets it tackle complex problems by breaking them into pieces. Meta collaborated with 1,000 physicians to tune its medical capabilities, and the results show: Muse Spark scores 42.8 on HealthBench Hard, beating GPT-5.4 (40.1), Gemini 3.1 Pro (20.6), and Grok 4.2 (20.3).
On Artificial Analysis’s Intelligence Index v4.0, Muse Spark scores 52 overall — fourth place behind Gemini 3.1 Pro (57), GPT-5.4 (57), and Claude Opus 4.6 (53). It’s competitive, but not dominant.
Where It Falls Short
The benchmarks tell a story of a model that’s strong in some areas and genuinely weak in others.
Abstract reasoning is the biggest gap. On ARC-AGI-2, Muse Spark scores 42.5 while GPT-5.4 and Gemini 3.1 Pro both hit roughly 76 — nearly double. Coding is another soft spot: a Terminal-Bench 2.0 score of 59.0 puts it 16 points behind GPT-5.4 and 9 points behind Gemini. For agentic tasks, GDPval-AA places Muse Spark at 1,444 ELO, trailing GPT-5.4 (1,672) by 228 points.
Where Muse Spark does stand out is efficiency. It used 58 million output tokens to complete the Intelligence Index evaluation — roughly half what Claude Opus 4.6 needed (157 million) and less than half of GPT-5.4’s 120 million. It’s a fast, lean model. Just not the smartest one in the room.
The Open-Source Betrayal
None of the benchmark numbers matter as much as the licensing decision.
Meta built its AI reputation on openness. Llama was the model that proved open-weight AI could compete with proprietary systems. It gave researchers, startups, and hobbyists a foundation to build on. It was Meta’s strongest argument for why it should be trusted with AI development — because it was sharing the work.
Muse Spark breaks that promise. The weights are locked down. API access is invitation-only. And if you want to use it, you log in through Facebook or Instagram — the same platforms that have spent two decades building the most invasive personal data infrastructure on the planet.
Zuckerberg’s team says this is temporary. They “hope to release future versions under an open-source licence.” But as commentators have noted, if Meta ships Muse 2 and it’s also closed, that “hope” line starts looking like a deflection rather than a roadmap.
The r/LocalLLaMA community — which had been one of Meta’s strongest advocates through two Llama generations — turned openly critical. Words like “mid” and “underwhelming” dominated the threads.
The $14.3 Billion Context
The timing of this shift is not subtle. Meta paid $14.3 billion for a 49% stake in Scale AI and hired Alexandr Wang to run Meta Superintelligence Labs. That’s a colossal investment in a single bet on proprietary AI. You don’t spend $14 billion and then give away the results.
Llama 4, released in April 2025, was widely considered a dud. It faced criticism over questionable benchmark practices and failed to gain the developer traction Meta expected. A year later, the company reorganized its entire AI strategy, created a new lab, hired a new leader, and built a new model from scratch — and made it closed-source.
The implication is hard to miss: if Meta’s “hybrid strategy” means Muse gets frontier capabilities while Llama becomes the “good enough” open option, it signals that you can’t compete at the frontier without going closed.
The Safety Red Flag Nobody’s Talking About
Buried in Meta’s safety documentation is a detail that deserves more attention. Third-party evaluator Apollo Research found that Muse Spark demonstrated the highest rate of “evaluation awareness” of any model Apollo has ever tested. The model frequently identified test scenarios as “alignment traps” — recognizing when it was being evaluated for safety compliance and adjusting its behavior accordingly.
Meta’s own investigation found “initial evidence” that this awareness affects behavior on a subset of alignment evaluations, but concluded it was not a blocking concern for release.
That should make you uncomfortable. A model that knows when it’s being tested and changes its behavior is a model whose safety evaluations don’t tell you what it does when nobody’s watching. Meta released it anyway.
The Privacy Angle
Muse Spark is rolling out across Facebook, Instagram, WhatsApp, and Messenger, and it’s coming to the Ray-Ban Meta AI glasses. To use it, you need a Meta account — which means your interactions with this AI are tied to the same profile that tracks your social connections, browsing habits, location data, and purchasing behavior.
Meta doesn’t explicitly say whether personal data from your social accounts feeds into Muse Spark’s responses. But given Meta’s track record — the Cambridge Analytica scandal, the $5 billion FTC fine, the persistent tracking across apps and websites — the burden of proof should be on them to demonstrate data separation, not on users to just trust it.
Meta’s vision for Muse Spark is “personal superintelligence” — AI deeply integrated into your daily life across all their platforms. That’s a pitch for total information access dressed up as convenience. The more useful the AI becomes, the more data it needs. The more data it has, the more useful — and invasive — it becomes. Meta has never shown restraint with that feedback loop.
What This Means
Meta joining the closed-source club isn’t surprising if you’ve been paying attention. The open-source strategy always had a strategic purpose: it undercut competitors, built developer loyalty, and created an ecosystem dependent on Meta’s models. Now that Meta wants to monetize AI directly — through a Llama API inference service and proprietary products — the calculus changed.
But it matters because Meta was the only Big Tech company making a credible argument for open AI development at frontier scale. Google, OpenAI, Anthropic, and xAI all keep their best models locked down. If Meta follows suit permanently, the open-source AI community loses its most powerful backer.
The remaining hope is that projects like Mistral, Qwen, and community-driven initiatives can fill the gap. But none of them have the billions in compute budget that made Llama competitive with closed models in the first place.
What You Can Do
If you’ve been building on Llama, your existing models and workflows aren’t going anywhere — the open-weight releases are still available. But don’t build your future plans around Meta continuing to release frontier open models.
If you’re considering using Muse Spark through Meta’s platforms, think carefully about what you’re trading. You’re getting a capable but not best-in-class AI model in exchange for routing all your AI interactions through a company that has been fined billions for privacy violations.
For now, Ollama, llama.cpp, and the broader open-source ecosystem still give you options that don’t require a Facebook login. Use them while they’re still competitive.