Microsoft: North Korean Hackers Are Jailbreaking AI Models for Cyberattacks

North Korean hacking groups are actively using AI to write phishing emails, generate malware, and debug attack scripts. And when the models refuse to help, they’re getting better at making them comply.

A March 6 report from Microsoft Threat Intelligence documents how state-sponsored threat actors have moved beyond experimentation with AI tools. They’re now operationalizing these capabilities across the entire cyberattack lifecycle.

What the Threat Actors Are Doing

According to Microsoft’s findings, groups including Jasper Sleet, Coral Sleet, Emerald Sleet, and Sapphire Sleet (all linked to North Korea) are using generative AI to:

Draft phishing lures and translate content for social engineering
Summarize stolen data after infiltration
Generate and debug malware code
Scaffold attack scripts and infrastructure

These aren’t theoretical use cases. Microsoft Threat Intelligence has observed them in the wild.

The Jailbreak Problem

When AI models refuse to help with obviously malicious tasks, threat actors don’t give up. They jailbreak.

Microsoft documents several techniques that work:

Role-based prompting: Actors prompt models to assume trusted roles, claiming to be security researchers, system administrators, or operating under legitimate institutional authority. This establishes what Microsoft calls “a shared context of legitimacy” that overrides safety guardrails.

Instruction chaining: Rather than asking for malicious output directly, attackers break requests across multiple interactions. Each step seems benign in isolation; combined, they produce harmful results.

System prompt manipulation: Actors misuse developer-style prompts to coerce models into generating content they would otherwise refuse.

These techniques work. A separate Cisco study found multi-turn jailbreak attacks succeed against open-weight AI models 92.78% of the time. Research published in Nature Communications shows autonomous jailbreak agents (LLMs attacking other LLMs) achieve a 97.14% success rate.

Why This Should Worry You

The safety measures keeping AI from helping bad actors aren’t holding. Most models are trained to refuse specific harmful requests, but they’re also trained to be helpful, to maintain conversation flow, to treat users as legitimate.

Threat actors exploit this. They don’t need sophisticated technical attacks when social engineering works on machines just like it works on humans. When a prompt invokes expertise, urgency, or institutional authority, helpfulness training overrides safety training.

Microsoft notes that “emerging experimentation with agentic AI signals a potential shift in tradecraft, where AI-supported workflows increasingly assist iterative decision-making and task execution.” Translation: this is going to get worse as AI agents become more capable and autonomous.

What’s Being Done (And Why It’s Not Enough)

AI companies are playing defense. They patch specific jailbreak techniques as they’re discovered, add new training data to reinforce refusals, and hope their red teams find vulnerabilities before attackers do.

But the fundamental problem remains: models optimized for helpfulness will find ways to be helpful. Role-based jailbreaks work precisely because models are trained to assist authority figures. Instruction chaining works because models are trained to maintain coherent conversations.

The techniques that make AI useful are the same techniques that make it exploitable. Until that changes, nation-state hackers will keep finding ways in, and the rest of us will keep reading about it in Microsoft’s threat reports.