IronCurtain: The Open-Source 'Firewall' for AI Agents That Might Actually Work

A veteran Google security engineer built a sandbox system that treats AI agents as fundamentally untrusted - and it could be the model for safe agent deployment.

AI agents are getting more powerful - and more dangerous. OpenClaw, the viral open-source agent, was found leaking over 40,000 users’ API keys and conversation histories. Prompt injection attacks routinely hijack agents into doing attackers’ bidding. And these tools are being deployed everywhere, often with full access to users’ files, credentials, and systems.

Enter IronCurtain, an open-source project that takes a different approach: treat the AI agent as fundamentally untrusted, and enforce security at the boundary.

Who Built It

Niels Provos isn’t some random developer experimenting with AI safety. He’s the creator of bcrypt (the password hashing algorithm used by most of the internet), the founder of Google’s Safe Browsing system, and former head of security at both Stripe and Lacework. The man has been building security systems for two decades.

According to Provos, the OpenClaw fiasco prompted him to ask: “How would I build a personal AI assistant while taking security seriously from the start?”

His answer: don’t trust the agent at all.

How IronCurtain Works

The name comes from theater - an iron curtain is a fireproof barrier between stage and audience. If something catches fire on stage, the curtain drops and contains the disaster. The agent performs; your systems stay protected.

IronCurtain implements what Provos calls a “chokepoint architecture.” Every action the AI agent tries to take - reading files, running commands, making API calls - passes through a policy enforcement layer. The agent never directly touches your system.

The architecture supports two modes:

Code Mode: The LLM writes TypeScript code that runs in an isolated V8 virtual machine. The agent has no filesystem or network access whatsoever. It can only invoke typed functions that map to MCP (Model Context Protocol) operations, which are individually evaluated by the policy engine.

Docker Mode: For external agents like Claude Code or Goose, the agent runs in a container with --network=none. It communicates only through a Unix socket to the MCP proxy. Even LLM API calls go through a TLS-terminating proxy that swaps in real credentials - the agent itself only sees fake API keys that pass format validation but do nothing.

Policies in Plain English

Most security systems require complex configuration files. IronCurtain takes a different approach: you write your policies in plain English.

For example:

  • “The agent may read and write files in the project directory”
  • “May perform read-only git operations without approval”
  • “Must ask me before pushing to any remote”
  • “Never delete anything permanently”

The system uses an LLM to compile these natural language policies into deterministic enforcement rules. A separate verification step generates test scenarios and validates that the compiled rules actually match your intent.

When the agent tries to do something, the policy engine decides in real-time whether to allow, deny, or escalate to human approval. If denied, the system returns the constitutional reasoning - helping re-anchor the model toward its original intent rather than whatever prompt injection or drift pushed it off course.

What It Can’t Do

Provos is refreshingly honest about limitations. In his documentation, he acknowledges several constraints:

No outbound content inspection: An agent allowed to write files could theoretically encode sensitive data to bypass content-level controls. LLM-based intelligibility checks on outbound content are planned but not yet implemented.

Escalation fatigue: Too many false positives lead to habitual approval. Users need to tune their constitutions to minimize unnecessary prompts.

Prompt injection is unsolved: IronCurtain cannot prevent prompt injection attacks - nobody can. It can only contain the damage through sandbox isolation and policy enforcement.

It’s a research prototype: The project is explicitly described as “an early-stage research project,” not a production-ready consumer product.

Why This Matters

The current state of AI agent security is abysmal. Most agents run with full system access. Most frameworks treat security as an afterthought. And most users have no idea what their AI assistant is actually doing on their behalf.

IronCurtain demonstrates that a different model is possible. By treating the agent as untrusted from the start - the same way we treat untrusted code from the internet - we can build agents that are both useful and containable.

The project supports multiple LLM providers (Anthropic’s Claude, Google’s Gemini, OpenAI’s GPT-4), includes 14 filesystem tools, 27 git operations, web fetching with markdown conversion, and even Signal integration for encrypted remote control.

Getting Started

IronCurtain requires Node.js 22+ and Docker. Installation is straightforward:

npm install -g @provos/ironcurtain
ironcurtain setup

The setup wizard walks through API key configuration, model selection, and initial policy creation. Running ironcurtain customize-policy lets you generate a constitution with LLM assistance.

The Bottom Line

IronCurtain isn’t perfect, and it won’t stop every attack. But it represents a serious attempt to bring real security engineering to the Wild West of AI agents. If you’re running autonomous AI tools with access to your files and credentials, this is the kind of architecture you should demand - even if you don’t use IronCurtain specifically.

The code is open source on GitHub under the Apache 2.0 license. Star count as of this writing: 75 stars.