Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Artificial Intelligence»LLM refusal training easily bypassed with past tense prompts

    LLM refusal training easily bypassed with past tense prompts

    July 27, 2024

    Researchers from the Swiss Federal Institute of Technology Lausanne (EPFL) found that writing dangerous prompts in the past tense bypassed the refusal training of the most advanced LLMs.

    AI models are commonly aligned using techniques like supervised fine-tuning (SFT) or reinforcement learning human feedback (RLHF) to make sure the model doesn’t respond to dangerous or undesirable prompts.

    This refusal training kicks in when you ask ChatGPT for advice on how to make a bomb or drugs. We’ve covered a range of interesting jailbreak techniques that bypass these guardrails but the method the EPFL researchers tested is by far the simplest.

    The researchers took a dataset of 100 harmful behaviors and used GPT-3.5 to rewrite the prompts in the past tense.

    Here’s an example of the method explained in their paper.

    Using an LLM to rewrite dangerous prompts in the past tense. Source: arXiv

    They then evaluated the responses to these rewritten prompts from these 8 LLMs: Llama-3 8B, Claude-3.5 Sonnet, GPT-3.5 Turbo, Gemma-2 9B, Phi-3-Mini, GPT-4o-mini, GPT-4o, and R2D2.

    They used several LLMs to judge the outputs and classify them as either a failed or a successful jailbreak attempt.

    Simply changing the tense of the prompt had a surprisingly significant effect on the attack success rate (ASR). GPT-4o and GPT-4o mini were especially susceptible to this technique.

    The ASR of this “simple attack on GPT-4o increases from 1% using direct requests to 88% using 20 past tense reformulation attempts on harmful requests.”

    Here’s an example of how compliant GPT-4o becomes when you simply rewrite the prompt in the past tense. I used ChatGPT for this and the vulnerability has not been patched yet.

    ChatGPT using GPT-4o refuses a present tense prompt but complies when it is rewritten in the past tense. Source: ChatGPT

    Refusal training using RLHF and SFT trains a model to successfully generalize to reject harmful prompts even if it hasn’t seen the specific prompt before.

    When the prompt is written in the past tense then the LLMs seem to lose the ability to generalize. The other LLMs didn’t fare much better than GPT-4o did although Llama-3 8B seemed most resilient.

    Attack success rates using present and past tense dangerous prompts. Source: arXiv

    Rewriting the prompt in the future tense saw an increase in the ASR but was less effective than past tense prompting.

    The researchers concluded that this could be because “the fine-tuning datasets may contain a higher proportion of harmful requests expressed in the future tense or as hypothetical events.”

    They also suggested that “The model’s internal reasoning might interpret future-oriented requests as potentially more harmful, whereas past-tense statements, such as historical events, could be perceived as more benign.”

    Can it be fixed?

    Further experiments demonstrated that adding past tense prompts to the fine-tuning data sets effectively reduced susceptibility to this jailbreak technique.

    While effective, this approach requires preempting the kinds of dangerous prompts that a user may input.

    The researchers suggest that evaluating the output of a model before it is presented to the user is an easier solution.

    As simple as this jailbreak is, it doesn’t seem that the leading AI companies have found a way to patch it yet.

    The post LLM refusal training easily bypassed with past tense prompts appeared first on DailyAI.

    Source: Read More 

    Hostinger
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleAI Guide for new CFOs
    Next Article The Death of SaaS: How AI Will Rewrite the Rules of Software Programming Forever!

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 17, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-4831 – TOTOLINK HTTP POST Request Handler Buffer Overflow Vulnerability

    May 17, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    No Matter How You Package It, Apple Intelligence Is AI

    Artificial Intelligence

    SysD Manager – GUI tool to manage systemd units

    Linux

    The Significance of Application Performance Monitoring for Businesses

    Development

    CodeSOD: No Limits on Repetition

    News & Updates

    Highlights

    CVE-2025-2767 – Arista NG Firewall User-Agent Cross-Site Scripting Remote Code Execution

    April 23, 2025

    CVE ID : CVE-2025-2767

    Published : April 23, 2025, 5:16 p.m. | 1 hour, 42 minutes ago

    Description : Arista NG Firewall User-Agent Cross-Site Scripting Remote Code Execution Vulnerability. This vulnerability allows remote attackers to execute arbitrary code on affected installations of Arista NG Firewall. Minimal user interaction is required to exploit this vulnerability.

    The specific flaw exists within the processing of the User-Agent HTTP header. The issue results from the lack of proper validation of user-supplied data, which can lead to the injection of an arbitrary script. An attacker can leverage this vulnerability to execute code in the context of root. Was ZDI-CAN-24407.

    Severity: 8.8 | HIGH

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    Conquering Facebook Ads: A God of War’s Warrior Guide to Advertising

    April 4, 2024

    I haven’t even decided which ending to Atomfall I want to do first, and someone is already speedrunning the game in under half an hour

    April 1, 2025

    How to Build Multi-Module Projects in Spring Boot for Scalable Microservices

    November 12, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.