Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Meta presents Self-Taught Evaluators: A New AI Approach that Aims to Improve Evaluators without Human Annotations and Outperforms Commonly Used LLM Judges Such as GPT-4

    Meta presents Self-Taught Evaluators: A New AI Approach that Aims to Improve Evaluators without Human Annotations and Outperforms Commonly Used LLM Judges Such as GPT-4

    August 7, 2024

    Advancements in NLP have led to the development of large language models (LLMs) capable of performing complex language-related tasks with high accuracy. These advancements have opened up new possibilities in technology and communication, allowing for more natural and effective human-computer interactions.

    A significant problem in NLP is the reliance on human annotations for model evaluation. Human-generated data is essential for training and validating models, but collecting this data is both costly and time-consuming. Furthermore, as models improve, previously collected annotations may need to be updated, reducing their utility in evaluating newer models. This creates a continuous need for fresh data, which poses challenges for scaling and sustaining effective model evaluations. Addressing this problem is crucial for advancing NLP technologies and their applications.

    Current methods for model evaluation typically involve collecting large amounts of human preference judgments over model responses. These methods include using automated metrics for tasks with reference answers or employing classifiers that output scores directly. However, these methods face limitations, especially for complex tasks where multiple valid responses are possible, such as creative writing or coding. The high variance in human judgments and the associated costs highlight the need for more efficient and scalable evaluation techniques.

    Researchers at Meta FAIR have introduced a novel approach called the “Self-Taught Evaluator.” This method eliminates the need for human annotations by using synthetically generated data for training. The process begins with a seed model, which produces contrasting synthetic preference pairs. The model then evaluates these pairs and improves iteratively, using its judgments to enhance its performance in subsequent iterations. This approach leverages the model’s capability to generate and evaluate data, significantly reducing dependency on human-generated annotations.

    The proposed method involves several key steps. Initially, a baseline response is generated for a given instruction using a seed LLM. A modified version of the instruction is then created, prompting the LLM to generate a new response designed to be lower quality than the original. These paired responses form the basis for training data. The model, acting as an LLM-as-a-Judge, generates reasoning traces and judgments for these pairs. This process is repeated iteratively, with the model continually improving its judgment accuracy through self-generated and self-evaluated data, effectively creating a cycle of self-improvement.

    The performance of the Self-Taught Evaluator was tested using the Llama-3-70B-Instruct model. The method improved the model’s accuracy on the RewardBench benchmark from 75.4 to 88.7, matching or surpassing the performance of models trained with human annotations. This significant improvement demonstrates the effectiveness of synthetic data in enhancing model evaluation. Furthermore, the researchers conducted multiple iterations, further refining the model’s capabilities. The final model achieved 88.3 accuracy with a single inference and 88.7 with majority voting, showcasing its robustness and reliability.

    In conclusion, the Self-Taught Evaluator offers a scalable and efficient NLP model evaluation solution. By leveraging synthetic data and iterative self-improvement, it addresses the challenges of relying on human annotations and keeps pace with the rapid advancements in language model development. This approach enhances model performance and reduces the dependency on human-generated data, paving the way for more autonomous and efficient NLP systems. The research team’s work at Meta FAIR marks a significant step forward in the quest for more advanced and autonomous evaluation methods in the field of NLP.

    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

    Don’t Forget to join our 47k+ ML SubReddit

    Find Upcoming AI Webinars here

    Arcee AI Released DistillKit: An Open Source, Easy-to-Use Tool Transforming Model Distillation for Creating Efficient, High-Performance Small Language Models

    The post Meta presents Self-Taught Evaluators: A New AI Approach that Aims to Improve Evaluators without Human Annotations and Outperforms Commonly Used LLM Judges Such as GPT-4 appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleImport TestNG results from Jenkins to ALM
    Next Article PleIAs Released OCRonos-Vintage: A 124 Million Parameter Model Trained on 18 Billion Tokens for Superior OCR Correction in Cultural Heritage Archives

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 16, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-47916 – Invision Community Themeeditor Remote Code Execution

    May 16, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    For OpenAI to win, Google must lose Chrome — making ChatGPT “a better product with a really incredible experience”

    News & Updates

    Disaster Recovery and Business Continuity Plan

    News & Updates

    Amazon just announced the dates for Prime Day 2024

    Development

    AI Statistics Everyone Should Know in 2024

    Development

    Highlights

    CVE-2025-47653 – WP-Recall PHP Remote File Inclusion Vulnerability

    May 7, 2025

    CVE ID : CVE-2025-47653

    Published : May 7, 2025, 3:16 p.m. | 20 minutes ago

    Description : Improper Control of Filename for Include/Require Statement in PHP Program (‘PHP Remote File Inclusion’) vulnerability in tggfref WP-Recall allows PHP Local File Inclusion. This issue affects WP-Recall: from n/a through 16.26.14.

    Severity: 7.5 | HIGH

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    World’s first 500W charger unveiled at CES 2025 – for all you SUPER power users

    January 7, 2025

    CVE-2025-47704 – Drupal Klaro Cookie & Consent Management Cross-Site Scripting (XSS)

    May 14, 2025

    FlickOS – Ubuntu-based Linux distribution

    March 18, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.