Anthropic’s Model Context Protocol promised to be the USB standard for AI - one connector to rule them all. But when production requirements hit, the promise starts cracking. At Ask 2026 on March 11, Perplexity CTO Denis Yarats announced his company is moving away from MCP in favor of traditional APIs and command-line tools.
The shift matters because Perplexity isn’t abandoning a niche protocol. They’re walking away from what was supposed to be the future of AI interoperability - and they’re not alone.
The Problem: 67,000 Tokens Before Work Begins
MCP’s fundamental issue is token overhead. The protocol requires loading complete tool schemas, descriptions, and metadata into the model’s context for every interaction. For a typical setup with seven MCP servers, that’s 67,300 tokens consumed - 33.7% of a 200k context window - before the agent does anything useful.
Some deployments report losing 72% of available context to MCP infrastructure alone.
“MCP interactions require loading extensive tool schemas, descriptions, traces, and metadata into the model’s context for every call,” explains one technical analysis. This consumes “tens of thousands of tokens per interaction,” driving up costs and latency in complex workflows.
The problem compounds. Organizations rarely connect just one MCP server. They connect five, six, or more - each adding its own schema overhead. As conversations grow, agents spend increasing effort deciding what tools not to use rather than accomplishing tasks.
Authentication Is a Mess
Beyond token overhead, MCP’s authentication model creates friction. Each MCP server manages its own authentication flows, breaking consistency across services.
Traditional REST APIs benefit from “decades of mature tooling for authentication and observability” that MCP cannot match at scale. Enterprise deployments need OAuth flows, granular permissions, rate limiting, audit logs, and credential management. MCP’s default patterns don’t provide these out of the box.
The reliability gap widens in production. MCP’s stdio transport works locally but introduces “unpredictability, higher latency, and monitoring difficulties in distributed or production environments.” Direct API calls produce deterministic outcomes. Agent-driven tool selection through MCP introduces inconsistency.
Perplexity’s Alternative: The Agent API
Rather than wrapping MCP around existing infrastructure, Perplexity inverted the architecture. Their Agent API provides:
- Single endpoint routing to OpenAI, Anthropic, Google, xAI, NVIDIA, and Perplexity models
- Built-in tools: web search ($0.005/call), URL fetch ($0.0005/call), function calling (free)
- OpenAI SDK-compatible format - switch by changing base URL and API key
- No tool schema bloat in system prompts
- Direct provider pricing with no markup
The design positions APIs as the foundational primitive. Agents operate directly against proven interfaces rather than through an intermediary protocol layer.
The Broader Retreat
Perplexity isn’t alone. Y Combinator CEO Garry Tan built a CLI rather than using MCP, citing reliability and speed concerns. Benchmarks show CLI tools demonstrate 33% better token efficiency than MCP in production scenarios. Code execution approaches show token reductions up to 98% compared to traditional MCP implementations.
The pattern is clear: when production requirements intensify, teams gravitate toward APIs and CLIs over protocol-level abstractions. MCP’s dynamic tool discovery - letting agents find and use tools without explicit programming - matters less than predictable costs and reliable execution.
MCP’s Response: The 2026 Roadmap
The official MCP roadmap for 2026 focuses on four priorities: transport evolution with stateless operation and load balancer compatibility, agent communication improvements, governance maturation, and enterprise readiness.
Notably absent: any direct response to context window concerns, token efficiency, or the authentication criticisms driving enterprise users away. The roadmap acknowledges “open-standards work rarely has” predictable timelines but offers no concrete mitigation for the overhead problem.
The November 2025 spec update added OAuth 2.1 authorization and structured tool annotations, but these address symptoms rather than the core architectural issue of context consumption.
What MCP Gets Right
MCP’s defenders make valid points. The protocol enables dynamic tool discovery - agents can identify and use tools at runtime without explicit programming. Static API integrations cannot replicate this capability.
For rapid prototyping and open-ended systems requiring runtime adaptation, MCP still offers advantages. The protocol shines when you don’t know in advance which tools an agent might need, when you want tools to self-describe their capabilities, or when you’re building systems that need to adapt to new integrations without code changes.
MCP Server Cards, a planned feature for exposing structured server metadata via .well-known URLs, could help with discovery overhead. The question is whether improvements arrive before enterprise adoption stalls.
The Emerging Hybrid Approach
The consensus for 2026 involves architectural flexibility rather than protocol orthodoxy:
- Skills for documentation (30-60% token savings through schema deduplication)
- CLI tools for local operations
- APIs for managed production runtimes
- MCP reserved for scenarios requiring dynamic discovery
This isn’t MCP’s death. It’s a repositioning - from universal standard to specialized tool. The protocol remains valuable for development, experimentation, and systems requiring runtime flexibility. Production deployments with tight cost and reliability requirements are moving elsewhere.
The Bottom Line
MCP’s promise of universal AI connectivity hit the wall that every abstraction layer eventually hits: overhead. When a protocol consumes 40-50% of your context window before work begins, the convenience stops being convenient. Perplexity’s retreat signals that the “USB for AI” vision needs significant evolution before enterprise users will trust it with production workloads.