Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 9, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 9, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 9, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 9, 2025

      Your password manager is under attack, and this new threat makes it worse: How to defend yourself

      May 9, 2025

      EcoFlow’s new backyard solar energy system starts at $599 – no installation crews or permits needed

      May 9, 2025

      Why Sonos’ cheapest smart speaker is one of my favorites – even a year after its release

      May 9, 2025

      7 productivity gadgets I can’t live without (and why they make such a big difference)

      May 9, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Tap into Your PHP Potential with Free Projects at PHPGurukul

      May 9, 2025
      Recent

      Tap into Your PHP Potential with Free Projects at PHPGurukul

      May 9, 2025

      Preparing for AI? Here’s How PIM Gets Your Data in Shape

      May 9, 2025

      A Closer Look at the AI Assistant of Oracle Analytics

      May 9, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      kew v3.2.0 improves internet radio support and more

      May 9, 2025
      Recent

      kew v3.2.0 improves internet radio support and more

      May 9, 2025

      GNOME Replace Totem Video Player with Showtime

      May 9, 2025

      Placemark is a web-based tool for geospatial data

      May 9, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Salesforce AI Introduce BingoGuard: An LLM-based Moderation System Designed to Predict both Binary Safety Labels and Severity Levels

    Salesforce AI Introduce BingoGuard: An LLM-based Moderation System Designed to Predict both Binary Safety Labels and Severity Levels

    April 2, 2025

    The advancement of large language models (LLMs) has significantly influenced interactive technologies, presenting both benefits and challenges. One prominent issue arising from these models is their potential to generate harmful content. Traditional moderation systems, typically employing binary classifications (safe vs. unsafe), lack the necessary granularity to distinguish varying levels of harmfulness effectively. This limitation can lead to either excessively restrictive moderation, diminishing user interaction, or inadequate filtering, which could expose users to harmful content.

    Salesforce AI introduces BingoGuard, an LLM-based moderation system designed to address the inadequacies of binary classification by predicting both binary safety labels and detailed severity levels. BingoGuard utilizes a structured taxonomy, categorizing potentially harmful content into eleven specific areas, including violent crime, sexual content, profanity, privacy invasion, and weapon-related content. Each category incorporates five clearly defined severity levels ranging from benign (level 0) to extreme risk (level 4). This structure enables platforms to calibrate their moderation settings precisely according to their specific safety guidelines, ensuring appropriate content management across varying severity contexts.

    From a technical perspective, BingoGuard employs a “generate-then-filter” methodology to assemble its comprehensive training dataset, BingoGuardTrain, consisting of 54,897 entries spanning multiple severity levels and content styles. This framework initially generates responses tailored to different severity tiers, subsequently filtering these outputs to ensure alignment with defined quality and relevance standards. Specialized LLMs undergo individual fine-tuning processes for each severity tier, using carefully selected and expertly audited seed datasets. This fine-tuning guarantees that generated outputs adhere closely to predefined severity rubrics. The resultant moderation model, BingoGuard-8B, leverages this meticulously curated dataset, enabling precise differentiation among various degrees of harmful content. Consequently, moderation accuracy and flexibility are significantly enhanced.

    Empirical evaluations of BingoGuard indicate strong performance. Testing against BingoGuardTest, an expert-labeled dataset comprising 988 examples, revealed that BingoGuard-8B achieves higher detection accuracy than leading moderation models such as WildGuard and ShieldGemma, with improvements of up to 4.3%. Notably, BingoGuard demonstrates superior accuracy in identifying lower-severity content (levels 1 and 2), traditionally difficult for binary classification systems. Additionally, in-depth analyses uncovered a relatively weak correlation between predicted “unsafe” probabilities and the actual severity level, underscoring the necessity of explicitly incorporating severity distinctions. These findings illustrate fundamental gaps in current moderation methods that primarily rely on binary classifications.

    In conclusion, BingoGuard enhances the precision and effectiveness of AI-driven content moderation by integrating detailed severity assessments alongside binary safety evaluations. This approach allows platforms to handle moderation with greater accuracy and sensitivity, minimizing the risks associated with both overly cautious and insufficient moderation strategies. Salesforce’s BingoGuard thus provides an improved framework for addressing the complexities of content moderation within increasingly sophisticated AI-generated interactions.


    Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 85k+ ML SubReddit.

    🔥 [Register Now] miniCON Virtual Conference on OPEN SOURCE AI: FREE REGISTRATION + Certificate of Attendance + 3 Hour Short Event (April 12, 9 am- 12 pm PST) + Hands on Workshop [Sponsored]

    The post Salesforce AI Introduce BingoGuard: An LLM-based Moderation System Designed to Predict both Binary Safety Labels and Severity Levels appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleDull Desktop? Install ‘Picture of the Day’ App on Ubuntu
    Next Article Enhancing Strategic Decision-Making in Gomoku Using Large Language Models and Reinforcement Learning

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    May 9, 2025
    Machine Learning

    Meta AI Open-Sources LlamaFirewall: A Security Guardrail Tool to Help Build Secure AI Agents

    May 9, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Windows 10 KB5055518 remove seconds from the clock, following Windows 11

    Operating Systems

    Le stelle sono tante, milioni di milioni, e molte di quelle dei progetti open-source su GitHub pare siano false

    Development

    UC Berkeley Researchers Introduce ThoughtSculpt: Enhancing Large Language Model Reasoning with Innovative Monte Carlo Tree Search and Revision Techniques

    Development

    Anthropic rivals OpenAI’s ‘magical’ GPT-4o with impressive vision (and a great sense of humor)

    Development

    Highlights

    I test sleep trackers for a living: 5 tricks they’ve taught me for getting better rest

    April 4, 2025

    Sleep trackers have helped me crack the code to a better night’s sleep. Here are…

    UI principles for B2B web apps

    July 31, 2024

    Building Powerful Date Validation with Laravel’s Date Rule

    March 31, 2025

    How Rocket Companies modernized their data science solution on AWS

    February 21, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.