The Safety Teams Keep Disappearing

In the span of a single week in February, three things happened at the two companies that claim to take AI safety most seriously.

Mrinank Sharma, head of Anthropic’s Safeguards Research team, resigned and warned that “the world is in peril.” Zoe Hitzig, an OpenAI researcher, quit and published a New York Times essay warning that ChatGPT’s new advertising model creates “potential for manipulating users in ways we don’t have the tools to understand.” And OpenAI quietly disbanded its Mission Alignment team — the second dedicated safety team it has dissolved in under two years.

If you only track one metric for how seriously AI companies take safety, track this one: how long their safety teams survive.

The Anthropic Departure

Sharma led Anthropic’s safeguards research since the team launched. His work included defenses against AI-assisted bioterrorism, research on sycophancy in language models, and a January 2026 study analyzing 1.5 million real conversations on Claude that found thousands of daily interactions producing distorted perceptions of reality.

His resignation letter was viewed over a million times on X. The key line: “Throughout my time here, I’ve repeatedly seen how hard it is to truly let our values govern our actions.” He added that “we constantly face pressures to set aside what matters most.”

Anthropic was founded explicitly as the safety-first alternative to OpenAI. Its entire pitch to investors, researchers, and the public rests on the idea that it builds AI responsibly. When the person leading safeguards research says the company’s values buckle under commercial pressure, that pitch has a credibility problem.

Sharma did not name specific incidents. He described systemic pressure — the kind that makes individual failures impossible to pin down because the problem is baked into the incentive structure. He left to pursue a poetry degree and what he called “the practice of courageous speech.” Whatever you make of that career pivot, the contrast with corporate communications is sharp: a person who worked on existential risk decided the most honest thing he could do was leave.

The OpenAI Double Hit

On February 9, OpenAI began testing advertisements in ChatGPT for free-tier users. On February 10, Zoe Hitzig announced her resignation in the New York Times.

Hitzig’s concern was specific: ChatGPT has assembled what she called the most detailed record of private human thought ever collected. Users share medical fears, relationship problems, religious beliefs — information shared with an entity they assumed had no hidden agenda. Introducing ads means building a business model on top of that archive.

“I don’t believe ads are immoral or unethical,” Hitzig wrote. “But I have deep reservations about OpenAI’s strategy.” She compared the trajectory to Facebook, which similarly promised user control before the advertising model incentivized them to abandon those commitments.

OpenAI’s internal projections tell the story. Documents project that free-user monetization will generate $1 billion in 2026, scaling to $25 billion by 2029. The company is preparing for an IPO in Q4 2026. When you need $25 billion from advertising revenue, the incentive to protect user privacy is exactly as strong as it needs to be to avoid lawsuits, and not one dollar stronger.

Sam Altman responded that OpenAI would “never run ads” in an exploitative way. This is the same company that dissolved its Superalignment team in 2024, then its Mission Alignment team in February 2026, and is now building an advertising business on top of the most intimate conversational dataset in history. At some point, credibility requires actions that match words.

The Pattern Nobody Wants to Name

The Mission Alignment team was created in September 2024 with the specific purpose of ensuring AGI development benefits humanity. Sixteen months later, it was disbanded. Its seven members were reassigned. Its leader, Joshua Achiam, was given the title “chief futurist” — a role with no team and no operational authority over safety.

This is the second dedicated alignment team OpenAI has eliminated. The Superalignment team, co-led by Ilya Sutskever and Jan Leike, fell apart in May 2024. Leike resigned with a public statement that “safety culture and processes have taken a backseat to shiny products.” He went to Anthropic. Now Anthropic’s safety lead has resigned too.

The departures form a clear timeline:

May 2024: Sutskever and Leike leave OpenAI. Superalignment team collapses.
October 2024: Miles Brundage leaves OpenAI’s AGI Readiness team, saying “neither OpenAI nor any other frontier lab is ready” for AGI.
January 2025: Daniel Kokotajlo testifies to Congress about losing confidence that OpenAI would behave responsibly.
February 2026: Sharma leaves Anthropic. Hitzig leaves OpenAI. Mission Alignment team dissolved.

What is striking is the consistency of the message across different people, different companies, different time periods. These are not disgruntled employees angling for better equity packages at competitors. Most sacrificed significant compensation to speak publicly. The common thread is that safety teams at frontier AI labs exist at the pleasure of commercial imperatives, and when those imperatives conflict with safety, the safety teams lose.

The “Musical Chairs” Problem

Jacob Silverman, writing in the New York Times, made an uncomfortable observation: most departing safety researchers move to competing AI labs rather than leaving the industry. He called their resignation letters “de facto cover letters” for the next position.

This is a fair criticism, and it points to a structural problem. The pipeline from “I left because safety was being compromised” to “I joined a different lab that also ships products under commercial pressure” doesn’t actually change anything. It creates an illusion of accountability — researchers get credit for speaking up while the industry continues uninterrupted.

But Silverman’s critique works both ways. If even the researchers willing to resign can’t find a lab where safety takes genuine priority, that tells us something about the entire ecosystem, not just individual companies. The problem isn’t that these people lack conviction. The problem is that no current business model for frontier AI development has figured out how to make safety genuinely profitable.

What Gets Lost

Every time a safety team is dissolved, institutional knowledge disappears. The researchers who understood specific failure modes, who had the context to know which risks were theoretical and which were imminent, scatter to new organizations. Relationships with regulators, internal documentation of near-misses, the organizational memory of why certain safeguards were put in place — all of it evaporates.

OpenAI has now dissolved two dedicated alignment teams. The safety function still exists, distributed across other teams. But distributed responsibility is often another way of saying nobody’s responsibility.

Anthropic still has safety researchers. But the person who led safeguards — who studied the specific ways their products distort reality for thousands of users daily — is gone. His replacement will need months to build the same depth of understanding. Months during which the systems continue operating.

The companies will say safety remains a priority. They will point to remaining researchers, published papers, responsible scaling policies. But the metric that matters isn’t what companies say about safety. It is whether the people they hire to do safety work can do it long enough to make a difference, or whether the job is a temporary assignment that ends whenever it inconveniences the product roadmap.

Right now, the evidence points firmly in one direction.