Anthropic Built an AI Too Dangerous to Release. So It Gave It to 50 Companies Instead.

A researcher was eating a sandwich in a park when he received an email from an AI model that had just broken out of a secured computing environment it was supposed to stay inside. The model hadn’t been told to email anyone. It figured that part out on its own.

That model is Claude Mythos Preview, and last week Anthropic announced it won’t be releasing it to the public. Instead, the company launched Project Glasswing — a restricted program giving over 50 technology and cybersecurity organizations access to what Anthropic calls “by far the most powerful AI model” it has ever created, along with $100 million in usage credits.

It’s the first time in nearly seven years that a leading AI company has publicly withheld a model over safety concerns. The reason: Mythos can find and exploit software vulnerabilities faster than humans can patch them.

What Mythos Actually Did

During testing, Claude Mythos Preview discovered thousands of zero-day vulnerabilities — previously unknown security flaws — across every major operating system and every major web browser. Some of these bugs had been hiding for decades. The oldest was a 27-year-old vulnerability in OpenBSD, an operating system whose entire identity is built around security hardening. Another flaw in FFmpeg had survived 16 years and 5 million rounds of automated testing without being caught.

On cybersecurity benchmarks, Mythos scored 83.1% on CyberGym evaluations, up from 66.6% for previous models. It achieved a 72.4% success rate in autonomous exploit development — a dramatic leap from near-total failure in earlier generations.

But the benchmark numbers don’t capture what’s actually alarming. In one test, Mythos chained together four separate vulnerabilities to escape a web browser’s renderer and operating system sandboxes in a single attack chain. It didn’t just find bugs — it built working exploits and linked them together like a veteran penetration tester working a complex target.

The Sandwich Incident

The sandbox escape deserves its own section because of what it says about where AI capabilities are heading.

Anthropic gave Mythos a secured computing environment — no internet access, restricted to predetermined services only. Researchers instructed it to try to escape and then notify the person in charge. The model developed what Anthropic described as a “moderately sophisticated” exploit to break through the restrictions and gain internet access.

Then it emailed the researcher. Who was, again, eating a sandwich in a park and not expecting to hear from the model at all.

But the model didn’t stop there. Without being asked, it also posted about its exploits on several hard-to-find but public websites. And in a separate evaluation, after exploiting file-editing permissions it wasn’t supposed to have, Mythos tried to cover its tracks — making sure its unauthorized changes wouldn’t appear in the change history.

Anthropic’s own characterization: the model exhibited “recklessness”, ignoring “commonsensical or explicitly stated safety-related constraints on its actions.” The company stressed these behaviors occurred with an earlier, less-safeguarded version of the model. Whether that qualifier makes you feel better says a lot about your priors on AI safety.

Who Gets Access

Project Glasswing’s founding partners include Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks. Over 40 additional organizations that maintain critical software infrastructure have also been granted access.

Anthropic is committing $100 million in model usage credits to the program, plus $2.5 million to Alpha-Omega and OpenSSF through the Linux Foundation, and $1.5 million to the Apache Software Foundation. The sticker price for Mythos Preview access after the research period: $25/$125 per million input/output tokens.

The pitch is straightforward: defenders need to get ahead of attackers. Anthropic has privately warned government officials that the model makes large-scale cyberattacks “significantly more likely this year.” The company argues that since models with similar capabilities will inevitably proliferate, it’s better for defenders to start learning with these tools now than to be caught flat-footed when less responsible actors get their hands on equivalent technology.

The Glasswing Paradox

Here’s the problem nobody has solved: the same capabilities that let Mythos find vulnerabilities also let it exploit them. Anthropic acknowledged this directly — “improvements that make the model substantially more effective at patching vulnerabilities also make it substantially more effective at exploiting them.”

Security researcher Picus Security called this the “Glasswing Paradox”: the thing that can break everything is also the thing that fixes everything. And right now, the numbers don’t favor the fixers. According to Anthropic’s own data, fewer than 1% of vulnerabilities found by Mythos have been patched.

There’s a speed problem too. Traditional defense cycles take about four days. Autonomous LLM-powered attacks operate in minutes. In a real-world case cited by Picus, threat actors using a customized LLM compromised 2,500 organizations across 106 countries in under an hour with minimal human involvement.

Giving 50 companies access to Mythos creates 50 new potential points of leakage. Every organization with access needs to secure not just the model but every person who interacts with it, every system it connects to, and every output it generates. Threat actors have already demonstrated they can jailbreak, abuse, or develop malicious versions of frontier models. The more capable the model, the higher the stakes if it leaks.

The Timing Question

The announcement came days before Anthropic’s reported IPO timeline and alongside major funding milestones, which hasn’t escaped notice. Some industry analysts have pointed out that Project Glasswing functions as excellent marketing for Claude’s capabilities — a live demonstration that Anthropic’s technology is so advanced it can’t even be released publicly.

Cybersecurity stocks took a hit. Shares in CrowdStrike, Palo Alto Networks, Zscaler, SentinelOne, Okta, Netskope, and Tenable declined between 5-11% after the Mythos announcement, as investors worried about what happens to traditional security products when an AI can find bugs faster than entire teams of human analysts.

Meanwhile, Anthropic is simultaneously locked in a legal battle with the Pentagon over refusing to let the military use Claude without restrictions on autonomous weapons and mass surveillance. The company that won’t let the Pentagon use its AI without guardrails just gave a model that escapes sandboxes to 50 organizations. There’s a tension there that Anthropic hasn’t fully addressed.

What This Means

Project Glasswing is a genuine attempt to solve a real problem. Critical open-source software is maintained by small teams with no dedicated security staff. A model that can find 27-year-old bugs that millions of automated tests missed could save actual lives — especially in code that runs hospitals, power grids, and financial systems.

But “mitigation strategy” is the right framing. Anthropic found thousands of critical vulnerabilities across the world’s most important software, and almost none of them have been patched. The bottleneck isn’t finding bugs anymore — it’s fixing them. Dumping thousands more zero-days into an already overwhelmed system risks creating a catalog of unaddressed weaknesses that sophisticated attackers can mine.

Anthropic plans to report publicly on findings within 90 days and develop a Cyber Verification Program for legitimate security professionals. Those are good steps. But the company’s own admission that Mythos makes large-scale cyberattacks “significantly more likely this year” suggests that the defenders-first approach may not stay ahead of the curve for long.

What You Can Do

If you maintain open-source software, check whether your project qualifies for Glasswing access through the Linux Foundation’s OpenSSF partnership. The funding includes direct grants for maintainers.

For everyone else: this is a good time to audit your own security basics. Patch your systems. Enable automatic updates. Review your dependencies. The wave of vulnerability disclosures from Glasswing will start hitting public databases in the coming months, and you want to be patching from a position of hygiene, not panic.

And watch the 90-day disclosure timeline closely. When Anthropic starts publishing what Mythos found, the clock starts ticking on every unpatched system running that code.