Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»This AI Paper by UC Berkeley Explores the Potential of Self-play Training for Language Models in Cooperative Tasks

    This AI Paper by UC Berkeley Explores the Potential of Self-play Training for Language Models in Cooperative Tasks

    July 1, 2024

    Artificial intelligence (AI) has seen significant advancements through game-playing agents like AlphaGo, which achieved superhuman performance via self-play techniques. Self-play allows models to improve by training on data generated from games played against themselves, proving effective in competitive environments like Go and chess. This technique, which pits identical copies of a model against each other, has pushed AI capabilities beyond human performance in these zero-sum games.

    A persistent challenge in AI is enhancing performance in cooperative or partially cooperative language tasks. Unlike competitive games, where the objective is clear-cut, language tasks often require collaboration and maintaining human interpretability. The issue is whether self-play, successful in competitive settings, can be adapted to improve language models in tasks where cooperation with humans is essential. This involves ensuring that AI can communicate effectively and understand nuances in human language without deviating from natural, human-like communication strategies.

    Existing research includes models like AlphaGo and AlphaZero, which use self-play for competitive games. Collaborative dialogue tasks like Cards, CerealBar, OneCommon, and DialOp evaluate models in cooperative settings using self-play as a proxy for human evaluation. Negotiation games like DoND and Craigslist Bargaining test models’ bartering abilities. However, these frameworks often struggle with maintaining human language interpretability and fail to generalize strategies effectively in mixed cooperative and competitive environments, limiting their real-world applicability.

    Researchers from the University of California, Berkeley, introduced a novel approach to test self-play in cooperative and competitive settings using a modified version of the negotiation game Deal or No Deal (DoND). This game, originally semi-competitive, was adapted to support various objectives, making it suitable for evaluating language model improvements across different collaboration levels. By modifying the reward structure, the game could simulate fully cooperative, semi-competitive, and strictly competitive environments, providing a versatile testbed for AI training.

    In the modified DoND game, two players negotiate item division with private value functions. The game adjusts to cooperative, semi-competitive, or competitive settings. Researchers used filtered behavior cloning for self-play training. Two identical language models played 500 games per round over ten rounds, with high-scoring dialogues used for finetuning. Initial models, including GPT-3.5 and GPT-4, were evaluated without few-shot examples to avoid bias. The OpenAI Gym-like environment managed game rules, message handling, and rewards. Human experiments were conducted on Amazon Mechanical Turk with pre-screened workers to validate model performance.

    The self-play training led to significant performance improvements. In cooperative and semi-competitive settings, models showed substantial gains, with scores improving by up to 2.5 times in cooperative and six times in semi-competitive scenarios compared to initial benchmarks. Specifically, models trained in the cooperative setting improved from an average score of 0.7 to 12.1, while in the semi-competitive setting, scores increased from 0.4 to 5.8. This demonstrates the potential of self-play to enhance language models’ ability to cooperate and compete effectively with humans, suggesting that these techniques can be adapted for more complex, real-world tasks.

    Despite the promising results in cooperative and semi-competitive environments, the strictly competitive setting posed challenges. Improvements were minimal, indicating that models tended to overfit during self-play. In this setting, models often struggled to generalize their strategies, failing to reach valid agreements with other agents, such as GPT-4. Preliminary human experiments further showed that these models rarely achieved agreements, highlighting the difficulty of applying self-play in zero-sum scenarios where robust, generalizable strategies are crucial.

    To conclude, this research, conducted by the University of California, Berkeley team, underscores the potential of self-play in training language models for collaborative tasks. The findings challenge the prevailing assumption that self-play is ineffective in cooperative domains or that models need extensive human data to maintain language interpretability. Instead, the significant improvements observed after just ten rounds of self-play suggest that language models with good generalization abilities can benefit from these techniques. This could lead to broader applications of self-play beyond competitive games, potentially enhancing AI’s performance in various collaborative and real-world tasks.

    Check out the Paper and Code. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. 

    Join our Telegram Channel and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 45k+ ML SubReddit

    The post This AI Paper by UC Berkeley Explores the Potential of Self-play Training for Language Models in Cooperative Tasks appeared first on MarkTechPost.

    Source: Read More 

    Hostinger
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleMeet Intuned: An AI-Powered Browser Automation Platform for Developers and Product Teams
    Next Article Meet Rakis: A Decentralized Verifiable Artificial Intelligence AI Network in the Browser

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 17, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-40906 – MongoDB BSON Serialization BSON::XS Multiple Vulnerabilities

    May 17, 2025
    Leave A Reply Cancel Reply

    Hostinger

    Continue Reading

    Intel’s Panther Lake chips will be available in Q1 2026, not late 2025

    Operating Systems
    Hackers Had Access to 150,000 Emails in U.S. Treasury Email Breach

    Hackers Had Access to 150,000 Emails in U.S. Treasury Email Breach

    Development

    Screenshot not found error in protractor

    Development

    Scale write performance on Amazon DocumentDB elastic clusters

    Databases

    Highlights

    Cisco Live 2024: New Unified Observability Experience Packages Cisco & Splunk Insight Tools

    June 5, 2024

    The observability suite is the first major overhaul for Splunk products since the Cisco acquisition.…

    Golden Retriever: An Agentic Retrieval Augmented Generation (RAG) Tool for Browsing and Querying Large Industrial Knowledge Stores More Effectively

    August 14, 2024

    Revolutionizing Renewable Energy: How Blockchain is Powering the Future

    April 7, 2025

    This AI Paper from UC Berkeley Research Highlights How Task Decomposition Breaks the Safety of Artificial Intelligence (AI) Systems, Leading to Misuse

    June 29, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.