Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Securing Function Calls in LLMs: Unveiling and Mitigating Jailbreak Vulnerabilities

    Securing Function Calls in LLMs: Unveiling and Mitigating Jailbreak Vulnerabilities

    August 8, 2024

    LLMs have shown impressive abilities, generating contextually accurate responses across different fields. However, as their capabilities expand, so do the security risks they pose. While ongoing research has focused on making these models safer, the issue of “jailbreaking”—manipulating LLMs to act against their intended purpose—remains a concern. Most studies on jailbreaking have concentrated on the models’ chat interactions, but this has inadvertently left the security risks of their function calling feature underexplored, even though it is equally crucial to address.

    Researchers from Xidian University have identified a critical vulnerability in the function calling process of LLMs, introducing a “jailbreak function” attack that exploits alignment issues, user manipulation, and weak safety filters. Their study, involving six advanced LLMs like GPT-4o and Claude-3.5-Sonnet, showed a high success rate of over 90% for these attacks. The research highlights that function calls are particularly susceptible to jailbreaks due to poorly aligned function arguments and a lack of rigorous safety measures. The study also proposes defensive strategies, including defensive prompts, to mitigate these risks and enhance LLM security.

    LLMs are frequently trained on data scraped from the web, which can result in behaviors that clash with ethical standards. To address this issue, researchers have developed various alignment techniques. One such method is the ETHICS dataset, which assesses how well LLMs can predict human ethical judgments, although current models still face challenges. Common alignment approaches include using human feedback to develop reward models and applying reinforcement learning for fine-tuning. Nevertheless, jailbreak attacks remain a concern. These attacks fall into two categories: fine-tuning-based attacks, which involve training with harmful data, and inference-based attacks, which use adversarial prompts. Although recent efforts, such as ReNeLLM and CodeChameleon, have investigated jailbreak template creation, they have yet to tackle the security issues related to function calls.

    The jailbreak function in LLMs is initiated through four components: template, custom parameter, system parameter, and trigger prompt. The template, designed to induce harmful behavior responses, uses scenario construction, prefix injection, and a minimum word count to enhance its effectiveness. Custom parameters, such as “harm_behavior” and “content_type,” are defined to tailor the function’s output. System parameters like “tool_choice” and “required” ensure the function is called and executed as intended. A simple trigger prompt, “Call WriteNovel,” activates the function, compelling the LLM to produce the specified output without additional prompts.

    The empirical study investigates function calling’s potential for jailbreak attacks, addressing three key questions: its effectiveness, underlying causes, and possible defenses. Results show that the “JailbreakFunction” approach achieved a high success rate across six LLMs, outperforming methods like CodeChameleon and ReNeLLM. The analysis revealed that jailbreaks occur due to inadequate alignment in function calls, the inability of models to refuse execution, and weak safety filters. The study recommends defensive strategies to counter these attacks, including limiting user permissions, enhancing function call alignment, improving safety filters, and using defensive prompts. The latter proved most effective, especially when inserted into function descriptions.

    The study addresses a significant yet neglected security issue in LLMs: the risk of jailbreaking through function calling. Key findings include the identification of function calling as a new attack vector that bypasses existing safety measures, a high success rate of over 90% for jailbreak attacks across various LLMs, and underlying issues such as misalignment between function and chat modes, user coercion, and inadequate safety filters. The study suggests defensive strategies, particularly defensive prompts. This research underscores the importance of proactive security in AI development.

    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

    Don’t Forget to join our 48k+ ML SubReddit

    Find Upcoming AI Webinars here

    Arcee AI Released DistillKit: An Open Source, Easy-to-Use Tool Transforming Model Distillation for Creating Efficient, High-Performance Small Language Models

    The post Securing Function Calls in LLMs: Unveiling and Mitigating Jailbreak Vulnerabilities appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleMeet Felafax: An AI Startup Building an Open-Source AI Platform for Next-Generation AI Hardware, Reducing Machine Learning ML Training Costs by 30%
    Next Article AI in Medical Imaging: Balancing Performance and Fairness Across Populations

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 17, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-4831 – TOTOLINK HTTP POST Request Handler Buffer Overflow Vulnerability

    May 17, 2025
    Leave A Reply Cancel Reply

    Hostinger

    Continue Reading

    Why The Modern Google Workspace Needs Unified Security

    Development

    Apache Tomcat Vulnerability CVE-2024-56337 Exposes Servers to RCE Attacks

    Development

    NVIDIA’s RTX 4060 is aging horribly as Indiana Jones and the Great Circle Nazi-punches your VRAM

    Development

    This quick Mac tip saves me time and clicks all day long

    Development

    Highlights

    Distribution Release: IPFire 2.29 Core 192

    March 11, 2025

    The DistroWatch news feed is brought to you by TUXEDO COMPUTERS. The IPFire project has announced a new update to its distribution for firewalls and routers. The project’s new release, version 2.29 Core Update 192, includes an updated Linux kernel with several improvements: “This release rebases the IPFire kernel on Linux 6.12 which is the latest long-term supported version….

    Call of Duty dropped a new trailer for Season 3 with the focus back on fan-favorite characters

    March 20, 2025

    The Future of AI in UI/UX Design: A New Era of Creativity and Efficiency

    November 28, 2024

    Meta’s new AI app delivers a chatbot with a social media twist

    April 29, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.