Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Generalizable Reward Model (GRM): An Efficient AI Approach to Improve the Generalizability and Robustness of Reward Learning for LLMs

    Generalizable Reward Model (GRM): An Efficient AI Approach to Improve the Generalizability and Robustness of Reward Learning for LLMs

    July 12, 2024

    Pretrained large models have shown impressive abilities in many different fields. Recent research focuses on ensuring these models align with human values and avoid harmful behaviors. To achieve this, alignment methods are crucial, where two primary methods are supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF). RLHF is useful in generalizing the reward model to new prompt-response pairs. However, it faces the challenge of training a reward model that works well with unseen data. One common problem is “overoptimization” or “reward hacking”. Increasing the size of the reward model and the amount of training data can help solve this issue, but it is not practical in real-world situations.

    This paper discusses two approaches in the related work. The first approach is Reward Modeling, where reward models are trained on human preference data to guide the RLHF process or prompt optimization. Recent research focuses on developing better reward models to improve the performance of large language models (LLMs) in RLHF. This includes enhancing reward modeling by improving the quality or quantity of preference data. The second approach is Mitigating Overoptimization in RLHF, where reward models often overfit and have trouble generalizing beyond the training data, leading to the issue of overoptimization. One can penalize overly confident model outputs using label smoothing or SFT regularization to reduce this problem.

    Researchers from HKUST, Georgia Institute of Technology, and the University of Illinois Urbana-Champaign have introduced the Generalizable Reward Model (GRM), which uses text-generation regularization on hidden states to improve the performance of reward models. Their study shows that all three types of text-generation regularization work well with GRM, with SFT regularization being the most effective and reliable solution. The results demonstrate that GRM greatly enhances the accuracy of reward models in various out-of-distribution (OOD) tasks. Moreover,  it consistently boosts the performance of RLHF and helps in reducing the problem of overoptimization.

    The Unified-Feedback dataset is used for training reward models, and it is one of the largest collections of pairwise feedback datasets. All reward models are trained on a subset of 400K and 40K instances from the Unified-Feedback dataset and evaluated on an 8K-instance hold-out eval set. Moreover, while evaluating model performance on OOD preference data, datasets like HHH-Alignment, MT-Bench Human Judgements, and RewardBench are used. The HHH-Alignment dataset evaluates language models on helpfulness, honesty, and harmlessness, while the MT-Bench dataset contains human preferences for model responses to MT-bench questions. 

    Here are the results after evaluating GRM:

    GRM greatly improves the generalization ability of reward models, leading to better performance on both (in-distribution) ID and OOD evaluation sets.

    All three types of text-generation regularization losses can enhance generalization, with SFT regularization being the most effective and consistent.

    It shows strong performance even with limited datasets, outperforming baselines with a huge margin.

    GRM efficiently reduces the overoptimization problem in BoN and PPO and is robust against label noise in the preference data.

    In conclusion, researchers have proposed the Generalizable Reward Model (GRM), an efficient method, that aims to improve the generalizability and robustness of reward learning for LLMs. GRM uses regularization techniques on the hidden states of reward models, which significantly improves the generalization performance of reward models for unseen data. Moreover, the proposed approach effectively reduces the problem of overoptimization in RLHF. These results will support future research in creating stronger reward models, helping to align large models more efficiently and solutions with cost-effectiveness.

    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. 

    Join our Telegram Channel and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 46k+ ML SubReddit

    The post Generalizable Reward Model (GRM): An Efficient AI Approach to Improve the Generalizability and Robustness of Reward Learning for LLMs appeared first on MarkTechPost.

    Source: Read More 

    Hostinger
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleResearchers at Stanford Introduce KITA: A Programmable AI Framework for Building Task-Oriented Conversational Agents that can Manage Intricate User Interactions
    Next Article Microsoft Research Introduces AgentInstruct: A Multi-Agent Workflow Framework for Enhancing Synthetic Data Quality and Diversity in AI Model Training

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 16, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-47916 – Invision Community Themeeditor Remote Code Execution

    May 16, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Salesforce AI Research Introduces AGUVIS: A Unified Pure Vision Framework Transforming Autonomous GUI Interaction Across Platforms

    Development

    W3Schools Offline Version Download 2025

    Development

    Light Leak Transitions, Effects & Templates

    Development

    CVE-2025-47817 – BlueWave Checkmate Role Parameter Injection Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    News & Updates

    This brand-new Alienware Area-51 with an RTX 5080 is $400 off right now

    April 16, 2025

    Dell’s new Alienware Area-51 desktop launched with next-gen specs and is already $400 off. Here’s…

    Anthropic’s CEO says “we do not understand how our own AI creations work” — and yes, we should all be “alarmed” by that

    May 5, 2025

    Why design is for everyone

    April 3, 2025

    Alibaba Launches New AI Model Qwen 2.5 Max AI; Claims It Can Beat DeepSeek, ChatGPT, Llama

    January 30, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.