Every time an employee asks an AI assistant to summarize a contract, analyze customer data, or draft a response using internal information, that prompt becomes a security liability. The query itself - not just the response - contains sensitive data that travels through systems, gets logged, and persists in ways most organizations haven’t accounted for.
While AI security discussions focus on training data poisoning and model safety, a more immediate risk has been hiding in plain sight: inference, the operational phase where models actually process requests. And according to a growing body of research, most enterprises aren’t protecting it.
The Gap Nobody Planned For
“Inference is AI working,” Tyson Macaulay, COO of 01Quantum, told attendees at a recent cybersecurity webinar. The critical exposure happens during operation, not during model development.
The problem is structural. Traditional security models weren’t designed for AI inference traffic. Encryption protects data until decryption - but once a prompt reaches an AI system, it’s exposed to application memory, runtime environments, and logging systems. Data loss prevention tools struggle with unstructured, dynamic AI prompts. And debugging logs create repositories of sensitive data far outside their original security perimeters.
According to CSO Online, prompts are treated as “operational exhaust rather than as high-value data.” That’s a category error. When employees feed contracts, customer records, strategic plans, and personally identifiable information into AI systems, they’re creating data flows that bypass standard security controls.
Nearly half of emerging AI security standards from NIST and ISO now focus specifically on prompt and inference security - a tacit admission that existing frameworks don’t cover this attack surface.
The Vulnerabilities Are Already Being Exploited
This isn’t theoretical. In early 2026, security researchers at Oligo Security discovered a chain of critical remote code execution vulnerabilities across major AI inference frameworks from Meta, Nvidia, Microsoft, and open-source projects.
The flaws shared a root cause: unsafe use of Python’s pickle deserialization. Successful exploitation could allow attackers to execute arbitrary code on GPU clusters, escalate privileges, exfiltrate model weights or customer data, or install cryptomining malware.
The affected frameworks and their CVEs:
- Meta Llama Stack (CVE-2024-50050)
- vLLM (CVE-2025-30165)
- Nvidia TensorRT-LLM (CVE-2025-23254)
- Modular Max Server (CVE-2025-60455)
SGLang was also impacted - affecting enterprises including xAI, AMD, Nvidia, Intel, LinkedIn, Cursor, Oracle Cloud, and Google Cloud.
What made these vulnerabilities particularly concerning was how they spread. Oligo found that developers had copied vulnerable code patterns across projects - sometimes identically, occasionally with comments like “Adapted from vLLM.” A single vulnerable component contaminated multiple downstream projects.
Prompt Injection Remains a Persistent Threat
Beyond infrastructure vulnerabilities, the models themselves remain exploitable. Prompt injection - where attackers manipulate AI systems through crafted inputs - appears in over 73% of production AI deployments assessed during security audits, according to Vectra.
The success rates are troubling. Attack success rates against state-of-the-art defenses exceed 85% when adaptive strategies are employed. And 90% of successful prompt injection attacks result in leakage of sensitive data.
Critical CVEs have been documented in Microsoft Copilot (CVSS 9.3), GitHub Copilot (CVSS 9.6), and Cursor IDE (CVSS 9.8) - demonstrating active production exploitation.
Even frontier models show vulnerability at scale. Testing against Claude Opus 4.5 in a coding environment showed a 4.7% attack success rate at one attempt, rising to 33.6% at 10 attempts and 63.0% at 100 attempts. That means persistent attackers have better than even odds of succeeding.
The Credential Problem
AI platforms have reached the same credential risk as other core enterprise SaaS solutions - but without the same security controls.
IBM’s 2026 X-Force Threat Intelligence Index found that infostealer malware led to the exposure of over 300,000 ChatGPT credentials in 2025. These stolen credentials give attackers access to conversation histories containing whatever sensitive information users have shared - and many users share a lot.
The broader trend: vulnerability exploitation became the leading cause of attacks in 2025, accounting for 40% of incidents observed by X-Force. AI-enabled vulnerability discovery is accelerating this trend, as attackers use automation to find missing authentication controls and exposed endpoints faster than defenders can patch them.
The Long-Term Cryptographic Risk
A subtler threat has emerged that security leaders now rank above model drift: “harvest now, decrypt later” attacks.
AI inference traffic contains data requiring years-long confidentiality - trade secrets, customer information, strategic plans - but uses short-term transport encryption. Adversaries can capture encrypted traffic today and store it for future decryption once quantum computing matures enough to break current cryptographic standards.
This isn’t paranoid speculation. 46.2% of infrastructure leaders recently surveyed said they lack confidence their AI systems meet anticipated 2026 security standards. The gap between AI deployment velocity and security architecture evolution continues to widen.
What Organizations Should Do
The recommendations from security researchers converge on several points:
Treat prompts as sensitive data. AI prompts should be classified and protected with the same rigor as the underlying data they reference. They’re not operational exhaust - they’re high-value information that happens to be formatted as questions.
Implement inference-specific monitoring. Traditional security tools weren’t designed for AI traffic patterns. Organizations need visibility into what’s being sent to AI systems, what’s being logged, and who has access.
Address over-permissioned access. Internal exposure exceeds external threat severity in most assessments. Over-permissioned service accounts, misconfigured logging, and legitimate access create “silent prompt leakage” without any attacker involvement.
Inventory cryptographic dependencies. Organizations should understand which AI traffic requires long-term confidentiality and plan for cryptographic agility - the ability to upgrade encryption methods without wholesale infrastructure replacement.
Patch inference frameworks. The ShadowMQ vulnerabilities have been patched in Meta Llama Stack v.0.0.41+, Nvidia TensorRT-LLM 0.18.2+, vLLM v0.8.0+, and Modular Max Server v25.6+. Organizations should verify they’re running updated versions.
The Bottom Line
AI security has been discussed primarily in terms of training data and model behavior. But inference - where AI actually does work - represents the more immediate enterprise risk. Every query is a potential data leak. Every logging configuration is a potential exposure point. Every unpatched framework is a potential entry point.
The uncomfortable truth is that most organizations deployed AI systems before understanding their security implications. Now they’re playing catch-up against attackers who’ve already noticed the gap.