AI News: OpenAI puts ads in ChatGPT, vindicating Anthropic's Super Bowl gamble

Top Stories

OpenAI Puts Ads in ChatGPT, One Day After Anthropic’s Anti-Ad Super Bowl Blitz

The timing could not have been more perfect for Anthropic’s marketing department. One day after the Super Bowl aired Anthropic’s “A Time and a Place” campaign mocking the idea of ads inside AI chatbots, OpenAI announced it is officially testing ads in ChatGPT. Sponsored links now appear at the bottom of ChatGPT responses for users on the free tier and the $8/month Go plan.

OpenAI says the ads are labeled as “sponsored” and “do not influence the answers ChatGPT gives you.” Users on Plus ($20/month), Pro, Business, Enterprise, and Education plans won’t see ads. The company framed this as a necessary step to sustain a free product that costs billions to run.

Whatever OpenAI’s reasoning, the optics are striking. Anthropic spent an estimated $14 million on Super Bowl ads specifically promising that Claude would never show ads in conversations. Within 24 hours, their competitor proved the threat was real. Whether this validates Anthropic’s strategy or just highlights how expensive it is to run an AI company without ad revenue depends on how long Anthropic can afford to keep that promise - a question made more pointed by the company’s simultaneous push to raise another $20 billion.

Sources: The Verge, TechCrunch

Anthropic Closing In on $20 Billion Funding Round

Anthropic is nearing completion of a $20 billion funding round, just five months after closing a $13 billion raise. The company raised that previous round at a reported $60 billion valuation, and this new round would push Anthropic’s valuation substantially higher.

The speed of the fundraise reflects the reality of frontier AI economics. Training and running models like Claude Opus 4.6 requires enormous compute resources, and competition with OpenAI, Google, and xAI shows no signs of slowing. The race to build agentic AI systems - the kind that can work autonomously on complex tasks rather than just answer questions - is accelerating the capital burn for every major lab.

The timing is notable. Anthropic is raising this money while simultaneously running a Super Bowl ad campaign, launching its most capable model to date, and watching its competitor introduce ads to offset costs. The AI industry’s central tension is on full display: these companies need revenue to fund the compute, but the most promising products cost more to run than they bring in.

Sources: TechCrunch

Chatbots Fail at Diagnosing Real Patients, Oxford Study Finds

A controlled study of 1,298 participants published in Nature Medicine found that large language models performed dramatically worse at medical diagnosis when interacting with real people compared to being fed clean clinical scenarios directly.

When researchers gave LLMs - including GPT-4o, Llama 3, and Cohere’s Command R+ - the full text of clinical scenarios, the models correctly identified conditions 94.9% of the time. But when actual people described those same conditions to the chatbots in natural conversation, accuracy plummeted to below 34.5%. Patients didn’t know what information the chatbots needed, and the models couldn’t ask the right follow-up questions.

The most alarming finding: in one case, two users described nearly identical symptoms of a subarachnoid hemorrhage. One was told to lie down in a dark room. The other received the correct advice to seek emergency care. The models also generated incorrect phone numbers, suggested calling Australian emergency services for UK patients, and fixated on irrelevant details from user messages.

“Despite all the hype, AI just isn’t ready to take on the role of the physician,” said Dr. Rebecca Payne, the study’s lead medical practitioner. The gap between benchmark performance and real-world utility is one of the most important findings in recent AI research - and one the industry has strong incentives to downplay.

Sources: 404 Media

Research & Analysis

Harvard Business Review: AI Doesn’t Reduce Work, It Intensifies It

A nine-month study of 200 employees at a U.S. technology company, published in the Harvard Business Review, found that AI tools increased productivity but created an unsustainable pace of work. Rather than freeing up time, AI introduced what researchers described as constant context-switching: employees managed multiple AI-generated threads simultaneously, ran parallel agents, and revived deferred tasks because AI could “handle them” in the background.

Workers reported feeling like they had a “partner” that created momentum, but the reality was perpetual attention-splitting, frequent output-checking, and a growing backlog of open tasks. The cognitive load of managing AI assistants proved exhausting even as the volume of completed work increased.

Simon Willison, whose blog has become a clearinghouse for grounded AI analysis, noted that the findings matched his own experience. “I’m frequently finding myself with work on two or three projects running parallel. I can get so much done, but after just an hour or two my mental energy for the day feels almost entirely depleted.”

The researchers called on organizations to develop structured “AI practices” to prevent burnout and distinguish genuine productivity gains from unsustainable intensity. It’s a rare acknowledgment that AI’s real-world impact on workers isn’t just about job displacement - it’s about what happens to the people who keep their jobs but can’t stop working.

Sources: Harvard Business Review, Simon Willison

Quick Hits

No company in New York has admitted to replacing workers with AI: New York state has required companies to disclose if “technological innovation or automation” caused job losses for nearly a year. So far, not a single company has filed such a disclosure - raising questions about whether the law is working as intended or being quietly ignored. Wired
AI may replace nuclear arms treaties: With the last major US-Russia nuclear arms treaty expired, some experts propose using satellite surveillance and AI monitoring as substitutes for formal agreements. Others call the approach dangerously inadequate for something with existential stakes. Wired
Databricks CEO: SaaS isn’t dead, but AI makes it irrelevant: Ali Ghodsi pushed back on the “SaaS is dead” narrative, arguing that AI won’t replace major SaaS apps with vibe-coded alternatives - but could enable new competitors that render existing products obsolete through fundamentally different approaches. TechCrunch
Anthropic hits trademark dispute in India: A local Indian software company called Anthropic Software has taken the US AI giant to court over its name as Anthropic expands operations in the country. TechCrunch
EU warns Meta on AI chatbot competition: The European Union told Meta it cannot block rival AI assistants on WhatsApp, the latest volley in Brussels’ ongoing effort to enforce interoperability requirements against Big Tech platforms. Bloomberg
Context engineering paper tests 11 models across 10,000-table schemas: A new paper found that frontier models (Opus 4.5, GPT-5.2, Gemini 2.5 Pro) significantly outperformed open-source models on complex SQL tasks, and that an unfamiliar data format called TOON consumed up to 740% more tokens than YAML as models struggled to parse it. Simon Willison

Worth Watching

The ChatGPT ads story is the one to track. OpenAI’s move confirms what many suspected: the free tier of AI chatbots will eventually be subsidized by advertising, just like search was before it. The question is whether ads in AI conversations change the fundamental trust relationship between users and these systems. When you ask a chatbot for a product recommendation and see a sponsored link at the bottom, how do you know the answer wasn’t shaped by the ad even if OpenAI promises otherwise? This is different from a Google search result page where ads are visually separated from organic results. In a conversational interface, the line between recommendation and advertisement is inherently blurry.

The Oxford medical study also deserves sustained attention. The 60-percentage-point gap between benchmark performance (94.9%) and real-world accuracy (34.5%) is one of the starkest demonstrations yet of why AI benchmarks are poor predictors of real-world utility. Models that ace medical exams fail at actually being doctors because medicine isn’t about knowing answers - it’s about knowing which questions to ask. Anyone citing benchmark scores to justify deploying AI in healthcare should have to explain this gap first.