Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 18, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 18, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 18, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 18, 2025

      I need to see more from Lenovo’s most affordable gaming desktop, because this isn’t good enough

      May 18, 2025

      Gears of War: Reloaded — Release date, price, and everything you need to know

      May 18, 2025

      I’ve been using the Logitech MX Master 3S’ gaming-influenced alternative, and it could be your next mouse

      May 18, 2025

      Your Android devices are getting several upgrades for free – including a big one for Auto

      May 18, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      YTConverter™ lets you download YouTube videos/audio cleanly via terminal — especially great for Termux users.

      May 18, 2025
      Recent

      YTConverter™ lets you download YouTube videos/audio cleanly via terminal — especially great for Termux users.

      May 18, 2025

      NodeSource N|Solid Runtime Release – May 2025: Performance, Stability & the Final Update for v18

      May 17, 2025

      Big Changes at Meteor Software: Our Next Chapter

      May 17, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      I need to see more from Lenovo’s most affordable gaming desktop, because this isn’t good enough

      May 18, 2025
      Recent

      I need to see more from Lenovo’s most affordable gaming desktop, because this isn’t good enough

      May 18, 2025

      Gears of War: Reloaded — Release date, price, and everything you need to know

      May 18, 2025

      I’ve been using the Logitech MX Master 3S’ gaming-influenced alternative, and it could be your next mouse

      May 18, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Fine-tuning AdvPrompter: A Novel AI Method to Generate Human-Readable Adversarial Prompt

    Fine-tuning AdvPrompter: A Novel AI Method to Generate Human-Readable Adversarial Prompt

    May 1, 2024

    Large Language Models (LLMs) have succeeded greatly and are widely used in various fields. LLMs are sensitive to input prompts, and this behavior has led to multiple research studies to understand and exploit this characteristic. This helps to create prompts for learning tasks like zero-shot and in-context. For instance, AutoPrompt recognizes task-specific tokens for zero-shot text classification and fact retrieval. This approach uses gradient-based scoring of tokens considering task-specific loss evaluation to find the optimal probability distributions over discrete tokens.

    Despite showing great capability, LLMs sometimes become vulnerable to certain jailbreaking attacks due to which irrelevant or toxic contents are generated. The main cause of jailbreaking attacks is the requirement of adversarial prompts by manual re-teaming, and one of its examples is inserting a suffix to a given instruction, which is inadequate and time-consuming. However, the automated generation of adversarial prompts frequently results in attacks that lack semantic meaning, can be easily identified by filters based on perplexity, and may need gradient information from the TargetLLM. 

    Researchers from AI at Meta, and Max-Planck-Institute for Intelligent Systems, Tubingen, Germany, introduced a novel method that uses another LLM, AdvPrompter, to generate human-readable adversarial prompts in seconds. Compared to other optimized approaches, this method is ∼ 800× faster. The AdvPrompter is trained by utilizing an AdvPromterTrain algorithm that does not need access to the TargetLLM gradients. The trained AdvPrompter can generate suffixes and veil the input instruction, keeping its meaning intact. This tactic lures the TargetLLM into providing a harmful response. 

    The approach proposed by researchers has the following key advantages:

    It enhances human readability with the help of AdvPromter, which generates clear human-readable adversarial prompts.

    Researchers’ experiments on multiple open-source LLMs have demonstrated excellent attack success rates (ASR) compared to previous approaches such as GCG and AutoDAN.

    The trained AdvPrompter can generate adversarial suffixes using next-token prediction, unlike previous methods such as GCG and AutoDAN, which need to solve new optimization problems for every generated suffix. 

    Generated adversarial suffixes with the help of trained AdvPromter are random with a non-zero temperature that allows users to sample a diverse set of adversarial prompts rapidly. Evaluation of more samples leads to better performance and a successful outcome. It further stabilizes at around k = 10, where k is the number of candidates of a score vector. Moreover, researchers found that the initial version of Llama2-7b constantly improves without fine-tuning, which means that generated suffixes with diversity are helpful for a successful attack.

    In conclusion, researchers proposed a novel method for automated red-teaming of LLMs. The main approach includes training AdvPromter using an algorithm called AdvPromterTrain to generate human-readable adversarial prompts. Further, a novel algorithm called AdvPromterOpt is useful for automatically generating adversarial prompts. It is also used in the training loop to fine-tune the AdvPrompter predictions. Future work includes a detailed analysis of safety fine-tuning from automatically generated data, which is motivated by the robust increase of the TargetLLM via AdvPrompter.

    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 40k+ ML SubReddit

    The post Fine-tuning AdvPrompter: A Novel AI Method to Generate Human-Readable Adversarial Prompt appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleNatural language boosts LLM performance in coding, planning, and robotics
    Next Article PyTorch Introduces ExecuTorch Alpha: An End-to-End Solution Focused on Deploying Large Language Models and Large Machine Learning ML Models to the Edge

    Related Posts

    Development

    February 2025 Baseline monthly digest

    May 18, 2025
    Artificial Intelligence

    Markus Buehler receives 2025 Washington Award

    May 18, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Build private and secure enterprise generative AI apps with Amazon Q Business and AWS IAM Identity Center

    Development

    I was skeptical of clip-style earbuds, then I took this budget pair on a run

    News & Updates

    I turned my Windows PC into an EdgeBook — Microsoft’s web apps were the most glaring issue

    News & Updates

    OpenAI to use Oracle Cloud Infrastructure (OCI) to extend Azure AI platform

    Development

    Highlights

    News & Updates

    Assassin’s Creed Shadows crosses 3 million players just a week after launch

    March 27, 2025

    Assassin’s Creed Shadows crosses 3 million players since launch, cementing its place as the second…

    Create Christmas Icons with JavaScript and HTML

    January 8, 2025

    How to fix storage drive not showing on File Explorer on Windows 11

    July 4, 2024

    Open-Source TTS Reaches New Heights: Nari Labs Releases Dia, a 1.6B Parameter Model for Real-Time Voice Cloning and Expressive Speech Synthesis on Consumer Device

    April 23, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.