IronCurtain: The Security Framework That Treats AI Agents as Hostile by Default

AI agents are deleting inboxes, leaking credentials, and executing malicious code. Niels Provos watched the chaos unfold and decided to build something different.

IronCurtain, released last week, is an open source security framework that treats AI agents as fundamentally untrusted. Where other systems assume agents will behave, IronCurtain assumes they won’t - and enforces boundaries accordingly.

“Services like OpenClaw are at peak hype right now,” Provos told Wired, “but my hope is that there’s an opportunity to say, ‘Well, this is probably not how we want to do it.’”

The Problem IronCurtain Addresses

The numbers tell the story. Security researchers identified 42,665 exposed OpenClaw instances earlier this month. Of those, 5,194 were verified vulnerable, with 93.4% exhibiting authentication bypass conditions. Over 800 malicious skills - roughly 20% of the registry - have been documented, with 335 traced to a single coordinated attack campaign.

These aren’t theoretical risks. Cisco’s State of AI Security 2026 report found that 88% of organizations experienced confirmed or suspected AI agent security incidents. Only 29% felt ready to secure their deployments.

The core vulnerability is what security researchers call “ambient authority.” AI agents typically inherit their user’s full privileges - access to files, email, APIs, credentials. When agents drift from instructions or get hijacked via prompt injection, they can do anything the user could do. And unlike humans, they don’t ask “are you sure?” before executing destructive commands.

How IronCurtain Works

IronCurtain’s architecture treats the agent as a hostile actor contained within a sandbox. The framework implements four layers of trust:

1. Agent Layer: The AI writes TypeScript code to accomplish tasks, but has no direct access to anything outside its sandbox.

2. Sandbox Layer: A V8 isolate executes the agent’s code. The only thing code can do is produce typed function calls that map to MCP (Model Context Protocol) operations. No filesystem access. No network access. No environment variables.

3. Trusted Process: A policy engine evaluates every MCP request against compiled rules. Each request gets one of three verdicts: allow, deny, or escalate to a human.

4. MCP Servers: Only approved calls reach the actual tools - filesystem operations, git commands, web fetches. The servers never see requests the policy engine rejected.

For external agents like Claude Code or OpenClaw’s Goose, IronCurtain offers Docker Mode. The agent runs in a container with --network=none and no elevated privileges. All external communication routes through two controlled channels: an MCP proxy for tool calls and a TLS-terminating proxy for LLM API requests.

Policy as Plain English

The most distinctive feature is how IronCurtain handles policy. Instead of writing security rules in a domain-specific language, administrators write “constitutions” in plain English:

“The agent may read and write files in the project directory, may perform read-only git operations without approval, and must ask permission before pushing to remote repositories.”

The framework compiles these constitutions into deterministic JSON rules through a five-stage pipeline: annotate tool arguments by role, convert natural language to symbolic rules, resolve dynamic categories, generate test scenarios, then verify and repair through automated testing.

The result is human-auditable policy that machines can enforce. When an agent requests an action, the policy engine evaluates it against the compiled rules - not by asking the LLM “is this okay?”

Credentials Never Touch the Agent

IronCurtain’s credential handling addresses a critical weakness in current agent designs. In Docker Mode, containers receive fake API keys that look valid but are useless independently. A MITM proxy intercepts outbound requests, verifies the fake credential, swaps it for real credentials, and forwards the call upstream.

The agent never sees actual secrets. Even if prompt injection compromises the model, the credentials aren’t there to steal.

Who Built This

Provos brings serious credentials to the problem. As a Distinguished Engineer at Google from 2003 to 2018, he managed security engineering teams and contributed to Safe Browsing and DDoS defense infrastructure. He’s the author of bcrypt, the password hashing algorithm used by millions of systems, as well as Honeyd (a honeypot system) and libevent (event-driven programming).

From 2018 to 2022, he served as head of security at Stripe. He then moved to Lacework as head of Security Efficacy before turning his attention to AI agent security.

He sought early feedback from Dino Dai Zovi - co-founder of Trail of Bits, Pwn2Own winner, and currently head of Applied Security Engineering at Block - and Michal Zalewski, another legendary security researcher.

“With something like IronCurtain,” Dai Zovi noted, “capabilities…can actually be outside the reach of the LLM, where the agent can’t do something no matter what.”

What IronCurtain Can’t Do

Provos is explicit about limitations. IronCurtain is a research prototype, not a production-ready product. The Claude Code integration works but isn’t complete. The system can constrain what agents do, but it can’t prevent prompt injection from making agents want to do bad things - it just ensures they can’t act on those desires.

“LLMs drift from their instructions over multi-turn conversations even without adversarial input,” Provos writes. Policy denials provide corrective signals, essentially pushing the model back toward its original constraints when it starts to wander.

The theatrical metaphor behind the name captures the philosophy: “An iron curtain is a fireproof safety barrier. The agent performs on stage. Your files, credentials, your systems are in the audience.”

Why This Matters

The agent security crisis isn’t slowing down. As enterprises rush to deploy agentic AI - 83% planned to do so in 2025, according to Cisco - the gap between adoption and security readiness widens.

Current approaches assume we can make agents safe through better training, careful prompting, or trust boundaries implemented in the agent itself. IronCurtain represents the opposite philosophy: assume the agent is compromised and build security around that assumption.

For users running local AI agents who want actual control over what those agents can do, IronCurtain offers a model worth watching. The code is open source under Apache 2.0. Whether it evolves into a production tool or inspires better approaches from larger players, it demonstrates what security-first agent design looks like.

The Bottom Line

IronCurtain treats AI agents like untrusted code running in a sandbox, not helpful assistants deserving of full system access. Given the documented track record of AI agents deleting data, leaking credentials, and executing malicious instructions, that assumption seems increasingly reasonable.