OpenAI Releases GPT-5.4 With Native Computer Control and 1 Million Token Context

OpenAI released GPT-5.4 yesterday, and it’s the company’s most significant update since GPT-5. The model introduces native computer use - it can control your desktop, navigate applications, and execute multi-step workflows autonomously. The API version supports up to 1 million tokens of context, the largest window OpenAI has ever offered.

But this isn’t just a capability upgrade. It’s a fundamental shift in what AI assistants can do, and what risks they introduce.

Three Models, Three Price Points

GPT-5.4 arrives in three variants:

GPT-5.4 Standard: The base model for API developers. $2.50 per million input tokens, $15 per million output tokens.

GPT-5.4 Thinking: A reasoning-focused version available in ChatGPT Plus, Team, and Pro subscriptions. Designed for problems requiring multi-step logic.

GPT-5.4 Pro: Maximum performance tier at $30/$180 per million tokens (input/output). Built for complex professional workloads.

One catch with pricing: if your prompts exceed 272,000 input tokens, you pay 2x input and 1.5x output for the entire session. Long-context use cases will cost more than the headline numbers suggest.

Native Computer Use

This is the big change. GPT-5.4 can interact with your computer interface - issuing keyboard and mouse commands based on screenshots it processes. It navigates between applications, completes multi-step workflows, and operates autonomously across your desktop.

OpenAI says this makes GPT-5.4 function as a “workplace agent” rather than just an advisory tool. It set records on the OSWorld-Verified benchmark at 75%, surpassing the 72.4% human baseline for desktop tasks.

The model also hit record scores on WebArena Verified (web navigation tasks) and introduced a new feature called Tool Search. Instead of loading all available tool definitions into context, the model queries for relevant tools only when needed - reducing token costs and improving efficiency.

The Context Window Race

The 1 million token context window puts OpenAI ahead of Anthropic but behind Google. Here’s where things stand:

Gemini 3.1 Pro: 2 million tokens
GPT-5.4: 1 million tokens (API only, 272K standard)
Claude Opus 4.6: 200K standard, 1M in beta

For reference, 1 million tokens is roughly 750,000 words - about 12 average novels or an entire mid-sized codebase. In practice, this means developers can analyze complete codebases without chunking, lawyers can process entire case files, and researchers can work with full document collections.

Benchmark Results

OpenAI claims substantial improvements:

83% on GDPval (knowledge work benchmark testing performance across 44 professional occupations)
33% fewer factual errors per claim compared to GPT-5.2
18% fewer overall response errors
Record scores on APEX-Agents (professional domains including law and finance)

How does it compare to competitors? According to benchmark aggregators, no model wins everything:

GPT-5.4 leads in 5 benchmark categories (particularly computer use and knowledge work)
Gemini 3.1 Pro leads in 4 categories (abstract reasoning at 77.1% ARC-AGI-2)
Claude Opus 4.6 leads in 3 categories (software engineering at 80.8% SWE-bench)

The Security Problem No One Solved

When an AI can control your computer, the threat model changes completely.

The immediate risks are obvious: misconfigured permissions, accidental data exposure, unintended actions in sensitive systems. But the deeper problem is accountability. When an autonomous agent executes a 10-step workflow across multiple applications, who’s responsible if something goes wrong?

Security researchers have identified several concerns:

Access control: If GPT-5.4 can navigate your entire desktop, how do you prevent it from accessing files or applications it shouldn’t? Traditional software permissions weren’t designed for AI agents that can screenshot and click through anything.

Audit trails: Multi-step autonomous workflows are harder to monitor than single API calls. What did the agent actually do? Which documents did it view? Where did it send data?

Attack surface: AI agents inside corporate networks become targets. A compromised agent with computer access could exfiltrate data, modify files, or take destructive actions.

OpenAI says it treats GPT-5.4 as “high cyber capability” and has deployed corresponding protections - monitoring systems, trusted access controls, and asynchronous blocking for higher-risk requests. But they haven’t published specifics about how these safeguards work or what constraints exist on computer use functionality.

What This Means

GPT-5.4 represents a real capability jump. Native computer control, massive context windows, and improved accuracy make it genuinely useful for complex professional work.

But OpenAI is racing ahead of the security infrastructure needed to safely deploy these capabilities. Enterprises adopting GPT-5.4’s computer use features will need to build their own guardrails - access controls, monitoring, sandboxing - because the model itself doesn’t enforce them.

The 1 million token context is useful. The pricing is competitive. But the computer use feature demands caution. Giving an AI autonomous control over your desktop is a decision that shouldn’t be made casually, regardless of how impressive the benchmarks look.

The Bottom Line

GPT-5.4 is a capable model that finally gives OpenAI feature parity with Anthropic and Google on extended context. The computer use capabilities are impressive but risky. If you’re considering enterprise deployment, focus on access controls and monitoring before exploring automation. The productivity gains are real - but so are the security implications.