OpenAI Just Deployed Its First AI Model on Non-NVIDIA Chips

GPT-5.3-Codex-Spark runs on Cerebras' wafer-scale chips at 1,000+ tokens per second. It's OpenAI's first production break from NVIDIA - and it won't be the last.

For the first time in its history, OpenAI is running a production AI model on chips that aren’t made by NVIDIA.

GPT-5.3-Codex-Spark, released February 12, is a streamlined coding model designed for real-time, interactive development. What makes it notable isn’t the model itself - it’s what’s underneath. Codex-Spark runs on hardware from Cerebras Systems, a chip startup whose approach to AI silicon looks nothing like NVIDIA’s.

The move is part of a broader, deliberate strategy by OpenAI to break its dependency on a single chip supplier. And if you follow the money, it’s clear this is just the beginning.

The Chip That Isn’t a GPU

Cerebras doesn’t make GPUs. It makes something far stranger: a single processor fabricated across an entire silicon wafer.

The Wafer Scale Engine 3 (WSE-3) is built on TSMC’s 5nm process and packs roughly 4 trillion transistors, 900,000 AI-optimized cores, and 44 GB of on-chip SRAM into a single 300mm wafer-sized die. For context, NVIDIA’s H100 has about 80 billion transistors. Cerebras has nearly 50 times that on one chip.

The design philosophy is fundamentally different from the GPU cluster approach. Instead of connecting thousands of smaller chips over high-speed interconnects - the standard recipe for AI inference - Cerebras keeps everything on one massive piece of silicon. Data doesn’t have to hop between chips, which eliminates the latency bottleneck that plagues GPU clusters for interactive workloads.

The result: Codex-Spark delivers over 1,000 tokens per second with a 50% reduction in time-to-first-token compared to GPU-based inference. OpenAI describes it as responsive enough to feel like “a human pair programmer.”

Why OpenAI Chose This Model for the Break

The choice of Codex-Spark as the debut Cerebras product wasn’t arbitrary. This is a model where latency matters more than throughput.

Regular Codex handles big, batch-oriented coding tasks - generate an entire module, run tests, iterate. Codex-Spark is the opposite: small, fast, interactive. It makes minimal, targeted edits and doesn’t automatically run tests unless asked. It’s the kind of tool developers leave running in their IDE for real-time suggestions as they type.

That use case plays directly to Cerebras’ architecture. As one analysis noted, Spark would be “economically impractical” to serve on NVIDIA’s Blackwell architecture, which is optimized for batch processing rather than ultra-low-latency single-request inference.

Sachin Katti, OpenAI’s Head of Industrial Compute, framed it diplomatically: “Cerebras has been a great engineering partner, and we’re excited about adding fast inference as a new platform capability.” He described it as bringing “wafer-scale compute into production” for “latency-sensitive work.”

In other words: NVIDIA is great for training and batch inference. But for the snappiest real-time responses, OpenAI found something better.

The $10 Billion Signal

The Codex-Spark deployment didn’t come out of nowhere. In January, OpenAI signed a multi-year deal worth over $10 billion with Cerebras, reportedly covering up to 750 megawatts of computing power over three years.

That’s a staggering commitment for a company that has publicly described its relationship with NVIDIA as “foundational.” But it’s not the only non-NVIDIA deal OpenAI has struck recently:

  • A six-gigawatt agreement with AMD to deploy Instinct AI GPUs starting in the second half of 2026
  • A multi-year partnership with Broadcom to co-develop custom AI accelerators

Taken together, OpenAI is building a multi-vendor chip strategy where NVIDIA remains the core but is no longer the only option. Benchmark, the VC firm, apparently agrees with the thesis - it just raised $225 million in special vehicles to increase its Cerebras position, participating in a $1 billion Series H that valued the company at $23 billion. Cerebras is targeting a Q2 2026 IPO.

What This Means for NVIDIA’s Grip

NVIDIA still dominates AI. Its GPUs train virtually every frontier model, and its CUDA ecosystem creates switching costs that make vendor lock-in look gentle by comparison. Nothing about the Cerebras deal changes that overnight.

But the deployment reveals a crack in the narrative that NVIDIA hardware is the only viable option for production AI workloads. It’s a narrow crack - inference only, specific use cases, one model - but the implications are real.

The AI chip market is entering a phase where different architectures serve different needs. GPU clusters for training and batch inference. Wafer-scale chips for ultra-low-latency interactive use. Custom ASICs (the Broadcom partnership) for whatever OpenAI decides needs purpose-built silicon. And AMD’s Instinct line as a cost-competitive alternative for general GPU compute.

If this sounds familiar, it should. The server CPU market went through the same diversification when ARM chips (AWS Graviton, Ampere) started eating into Intel’s datacenter monopoly. Intel still sells plenty of chips, but the days of 90%+ market share are over. NVIDIA may be watching the same pattern begin.

What It Means for You

For developers using Codex-Spark today, the chip underneath doesn’t matter much. The model is rolling out as a research preview for ChatGPT Pro users through the Codex app, CLI, and VS Code extension. It’s text-only, 128k context, and fast.

The bigger story is about what happens next. Cerebras says it plans to bring its ultra-fast inference capability to “the largest frontier models in 2026.” If a stripped-down coding model was the test case, imagine what happens when they scale to GPT-5-class general reasoning at the same latency.

For the AI industry, the message is clearer: the era of single-supplier dependency is ending. That’s good for competition, good for pricing, and good for the companies building on this infrastructure. Whether it’s good for NVIDIA’s stock price is a different question.