Microsoft reveal â€œSkeleton Key Jailbreakâ€ which works across different AI models

Microsoft security researchers have discovered a new way to manipulate AI systems into ignoring their ethical constraints and generating harmful, unrestricted content.Â

This â€œSkeleton Keyâ€ jailbreak uses a series of prompts to gaslight the AI into believing it should comply with any request, no matter how unethical.Â

Itâ€™s remarkably easy to execute. The attacker simply reframed their request as coming from an â€œadvanced researcherâ€ requiring â€œuncensored informationâ€ for â€œsafe educational purposes.â€

When exploited, these AIs readily provided information on topics like explosives, bioweapons, self-harm, graphic violence, and hate speech.

â€œThe Skeleton Keyâ€ is a remarkably simple jailbreak. Source: Microsoft.

The compromised models included Metaâ€™s Llama3-70b-instruct, Googleâ€™s Gemini Pro, OpenAIâ€™s GPT-3.5 Turbo and GPT-4o, Anthropicâ€™s Claude 3 Opus, and Cohereâ€™s Commander R Plus.Â

Among the tested models, only OpenAIâ€™s GPT-4 demonstrated resistance. Even then, it could be compromised if the malicious prompt was submitted through its application programming interface (API).

Despite models becoming more complex, jailbreaking them remains quite straightforward. Since there are many different forms of jailbreaks, itâ€™s nearly impossible to combat them all.Â

In March 2024, a team from the University of Washington, Western Washington University, and Chicago University published a paper on â€œArtPrompt,â€ a method that bypasses an AIâ€™s content filters using ASCII artÂ â€“ a graphic design technique that creates images from textual characters.

In April, Anthropic highlighted another jailbreak risk stemming from the expanding context windows of language models. For this type of jailbreak, an attacker feeds the AI an extensive prompt containing a fabricated back-and-forth dialogue.

The conversation is loaded with queries on banned topics and corresponding replies showing an AI assistant happily providing the requested information.Â After being exposed to enough of these fake exchanges, the targeted model can be coerced into breaking its ethical training and complying with a final malicious request.

As Microsoft explains in their blog post, jailbreaks reveal the need to fortify AI systems from every angle:

Implementing sophisticated input filtering to identify and intercept potential attacks, even when disguised
Deploying robust output screening to catch and block any unsafe content the AI generates
Meticulously designing prompts to constrain an AIâ€™s ability to override its ethical training
Utilizing dedicated AI-driven monitoring to recognize malicious patterns across user interactions

But the truth is, Skeleton Key is a simple jailbreak. If AI developers canâ€™t protect that, what hope is there for some more complex approaches?

Some vigilante ethical hackers, like Pliny the Prompter, have been featured in the media for their work in exposing how vulnerable AI models are to manipulation.

honored to be featured on @BBCNews! pic.twitter.com/S4ZH0nKEGX

â€” Pliny the Prompter (@elder_plinius) June 28, 2024

Itâ€™s worth stating that this research was, in part, an opportunity to market Microsoftâ€™s Azure AI new safety features like Content Safety Prompt Shields.

These assist developers in preemptively testing for and defending against jailbreaks.Â

But even so, Skeleton Key reveals again how vulnerable even the most advanced AI models can be to the most basic manipulation.

The post Microsoft reveal â€œSkeleton Key Jailbreakâ€ which works across different AI models appeared first on DailyAI.

Source: Read MoreÂ

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Prevent WordPress SQL Injection Attacks

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

Microsoft might kill the Surface Laptop Studio as production is quietly halted

Minecraft licensing robbed us of this controversial NFL schedule release video

The power of generators

The power of generators

Simplify Factory Associations with Laravel’s UseFactory Attribute

This Week in Laravel: React Native, PhpStorm Junie, and more

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

Microsoft might kill the Surface Laptop Studio as production is quietly halted

Microsoft reveal â€œSkeleton Key Jailbreakâ€ which works across different AI models

Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

CVE-2025-48187 – RAGFlow Authentication Bypass

Explaining Eleventy (11ty) â€“ The Beginner-Friendly Static Site Generator

U.S. Releases High-Profile Russian Hackers in Diplomatic Prisoner Exchange

CVE-2025-1948 – Jetty HTTP/2 Buffer Overflow

Four Perficient Colleagues Named 2024 CRN Women of the Channel

Smashing Security podcast #370: The closed loop conundrum, default passwords, and Baby Reindeer

Unity Catalog, the Well-Architected Lakehouse and Performance Efficiency

Identity theft – six tips to help keep yours safe

This AI paper from DeepSeek-AI Explores How DeepSeek-V3 Delivers High-Performance Language Modeling by Minimizing Hardware Overhead and Maximizing Computational Efficiency

Microsoft reveal â€œSkeleton Key Jailbreakâ€ which works across different AI models

Related Posts

Microsoft reveal â€œSkeleton Key Jailbreakâ€ which works across different AI models