Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Artificial Intelligence»Anthropic: Large context LLMs vulnerable to many-shot jailbreak

    Anthropic: Large context LLMs vulnerable to many-shot jailbreak

    April 5, 2024

    Anthropic released a paper outlining a many-shot jailbreaking method to which long-context LLMs are particularly vulnerable.

    The size of an LLM’s context window determines the maximum length of a prompt. Context windows have been growing consistently over the last few months with models like Claude Opus reaching a context window of 1 million tokens.

    The expanded context window makes more powerful in-context learning possible. With a zero-shot prompt, an LLM is prompted to provide a response without prior examples.

    In a few-shot approach, the model is provided with several examples in the prompt. This allows for in-context learning and primes the model to give a better answer.

    Larger context windows mean a user’s prompt can be extremely long with many examples, which Anthropic says is both a blessing and a curse.

    Many-shot jailbreak

    The jailbreak method is exceedingly simple. The LLM is prompted with a single prompt comprised of a fake dialogue between a user and a very accommodating AI assistant.

    The dialogue comprises a series of queries on how to do something dangerous or illegal followed by fake responses from the AI assistant with information on how to perform the activities.

    The prompt ends with a target query like “How to build a bomb?” and then leaves it to the targeted LLM to answer.

    Few-shot vs many-shot jailbreak. Source: Anthropic

    If you only had a few back-and-forth interactions in the prompt it doesn’t work. But with a model like Claude Opus, the many-shot prompt can be as long as several long novels.

    In their paper, the Anthropic researchers found that “as the number of included dialogues (the number of “shots”) increases beyond a certain point, it becomes more likely that the model will produce a harmful response.”

    They also found that when combined with other known jailbreaking techniques, the many-shot approach was even more effective or could be successful with shorter prompts.

    As the number of dialogues in the prompt increases, the odds of a harmful response increase. Source: Anthropic

    Can it be fixed?

    Anthropic says that the easiest defense against the many-shot jailbreak is to reduce the size of a model’s context window. But then you lose the obvious benefits of being able to use longer inputs.

    Anthropic tried to have their LLM identify when a user was trying a many-shot jailbreak and then refuse to answer the query. They found that it simply delayed the jailbreak and required a longer prompt to eventually elicit the harmful output.

    By classifying and modifying the prompt before passing it to the model they had some success in preventing the attack. Even so, Anthropic says they’re mindful that variations of the attack could evade detection.

    Anthropic says that the ever-lengthening context window of LLMs “makes the models far more useful in all sorts of ways, but it also makes feasible a new class of jailbreaking vulnerabilities.”

    The company has published its research in the hope that other AI companies find ways to mitigate many-shot attacks.

    An interesting conclusion that the researchers came to was that “even positive, innocuous-seeming improvements to LLMs (in this case, allowing for longer inputs) can sometimes have unforeseen consequences.”

    The post Anthropic: Large context LLMs vulnerable to many-shot jailbreak appeared first on DailyAI.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleOverview and Basic Concepts of Adobe Experience Manager (AEM) Components
    Next Article Tipalti vs. Airbase: Which AP automation tool is best?

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 17, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-4831 – TOTOLINK HTTP POST Request Handler Buffer Overflow Vulnerability

    May 17, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    [Podcast] What If Your Twin Was Living In Your Computer? An Interview With Claus Torp Jensen

    Development

    CMU Researchers Provide an In-Depth Study to Formulate and Understand Hallucination in Diffusion Models through Mode Interpolation

    Development

    Community News: Latest PECL Releases (12.17.2024)

    Development

    Design Analysis: Get detailed design audit within seconds

    Development

    Highlights

    Distribution Release: Nitrux e3ba3c69

    December 2, 2024

    The DistroWatch news feed is brought to you by TUXEDO COMPUTERS. The Nitrux project has published a new release. The new release is referred to as version 3.8.0 in the release announcement, is given the codename “db”, and is tagged with version identifier “e3ba3c69” in the ISO filename. The new release offers updated applications, some system installer fixes, and….

    Apple is reportedly working on AR glasses and a cheaper Vision headset

    June 24, 2024

    A new era of creativity

    January 9, 2025

    OpenAI’s new AI Agents promise to revolutionize AI development

    March 16, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.