Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Designing For TV: Principles, Patterns And Practical Guidance (Part 2)

      September 5, 2025

      Neo4j introduces new graph architecture that allows operational and analytics workloads to be run together

      September 5, 2025

      Beyond the benchmarks: Understanding the coding personalities of different LLMs

      September 5, 2025

      Top 10 Use Cases of Vibe Coding in Large-Scale Node.js Applications

      September 3, 2025

      Building smarter interactions with MCP elicitation: From clunky tool calls to seamless user experiences

      September 4, 2025

      From Zero to MCP: Simplifying AI Integrations with xmcp

      September 4, 2025

      Distribution Release: Linux Mint 22.2

      September 4, 2025

      Coded Smorgasbord: Basically, a Smorgasbord

      September 4, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Drupal 11’s AI Features: What They Actually Mean for Your Team

      September 5, 2025
      Recent

      Drupal 11’s AI Features: What They Actually Mean for Your Team

      September 5, 2025

      Why Data Governance Matters More Than Ever in 2025?

      September 5, 2025

      Perficient Included in the IDC Market Glance for Digital Business Professional Services, 3Q25

      September 5, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      How DevOps Teams Are Redefining Reliability with NixOS and OSTree-Powered Linux

      September 5, 2025
      Recent

      How DevOps Teams Are Redefining Reliability with NixOS and OSTree-Powered Linux

      September 5, 2025

      Distribution Release: Linux Mint 22.2

      September 4, 2025

      ‘Cronos: The New Dawn’ was by far my favorite experience at Gamescom 2025 — Bloober might have cooked an Xbox / PC horror masterpiece

      September 4, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Allie: A Human-Aligned Chess Bot

    Allie: A Human-Aligned Chess Bot

    April 21, 2025
    Allie: A Human-Aligned Chess Bot

    Play against Allie on lichess!

    Introduction

    In 1948, Alan Turning designed what might be the first chess playing AI, a paper program that Turing himself acted as the computer for. Since then, chess has been a testbed for nearly every generation of AI advancement. After decades of improvement, today’s top chess engines like Stockfish and AlphaZero have far surpassed the capabilities of even the strongest human grandmasters.

    However, most chess players are not grandmasters, and these state-of-the-art Chess AIs have been described as playing more like aliens than fellow humans.

    The core problem here is that strong AI systems are not human-aligned; they are unable to match the diversity of skill levels of human partners and unable to model human-like behaviors beyond piece movement. Understanding how to make AI systems that can effectively collaborate with and be overseen by humans is a key challenge in AI alignment. Chess provides an ideal testbed for trying out new ideas towards this goal – while modern chess engines far surpass human ability, they are completely incapable of playing in a human-like way or adapting to match their human opponents’ skill levels. In this paper, we introduce Allie, a chess-playing AI designed to bridge the gap between artificial and human intelligence in this classic game.

    What is Human-aligned Chess?

    When we talk about “human-aligned” chess AI, what exactly do we mean? At its core, we want a system that is both humanlike, defined as making moves that feel natural to human players, as well as skill-calibrated, defined as capable of playing at a similar level against human opponents across the skill spectrum.

    Our goal here is quite different from traditional chess engines like Stockfish or AlphaZero, which are optimized solely to play the strongest moves possible. While these engines achieve superhuman performance, their play can feel alien to humans. They may instantly make moves in complex positions where humans would need time to think, or continue playing in completely lost positions where humans would normally resign.

    Building Allie

    Allie's system design
    Figure 1: (a) A game state is represented as the sequence of moves that produced it and some metadata. This sequence is inputted to a Transformer, which predicts the next move, pondering time for this move, and a value assessment of the move. (b) At inference time, we employee Monte-Carlo Tree Search with the value predictions from the model. The number of rollouts (N_mathrm{sim}) is chosen dynamically based on the predicted pondering time.

    A Transformer model trained on transcripts of real games

    While most prior deep learning approaches build models that input a board state, and output a distribution over possible moves, we instead approach chess like a language modeling task. We use a Transformer architecture that inputs a sequence of moves rather than a single board state. Just as large language models learn to generate human-like text by training on vast text corpora, we hypothesized that a similar architecture could learn human-like chess by training on human game records. We train our chess “language” model on transcripts of over 93M games encompassing a total of 6.6 billion moves, which were played on the chess website Lichess.

    Conditioning on Elo score

    In chess, Elo scores normally fall in the range of 500 (beginner players) to 3000 (top chess professionals). To calibrate the playing strength of ALLIE to different levels of players, we model gameplay under a conditional generation framework, where encodings of the Elo ratings of both players are prepended to the game sequence. Specifically, we prefix each game with soft control tokens, which interpolate between a weak token, representing 500 Elo, and a strong token, representing 3000 Elo.

    For a player with Elo rating (k), we compute a soft token (e_k) by linearly interpolating between the weak and strong tokens:

    $$e_k = gamma e_text{weak} + (1-gamma) e_text{strong}$$

    where (gamma = frac{3000-k}{2500}). During training, we prefix each game with two soft tokens corresponding to the two players’ strengths.

    Learning objectives

    On top of the base Transformer model, Allie has three prediction objectives:

    1. A policy head (p_theta) that outputs a probability distribution over possible next moves
    2. A pondering-time head (t_theta) that outputs the number of seconds a human player would take to come up with this move
    3. A value assessment head (v_theta) that outputs a scalar value representing who expects to win the game

    All three heads are individually parametrized as linear layers applied to the final hidden state of the decoder. Given a dataset of chess games, represented as a sequence of moves (mathbf{m}), human ponder time before each move (mathbf{t}), and game output (v) we trained Allie to minimize the log-likelihood of next moves and MSE of time and value predictions:

    $$mathcal{L}(theta) = sum_{(mathbf{m}, mathbf{t}, v) in mathcal{D}} left( sum_{1 le i le N} left( -log p_theta(m_i ,|, mathbf{m}_{lt i}) + left(t_theta(mathbf{m}_{lt i}) – t_iright)^2 + left(v_theta(mathbf{m}_{lt i}) – vright)^2 right) right) text{.}$$

    Adaptive Monte-Carlo Tree Search

    At play-time, traditional chess engines like AlphaZero use search algorithms such as Monte-Carlo Tree Search (MCTS) to anticipate many moves into the future, evaluating different possibilities for how the game might go. The search budget (N_mathrm{sim}) is almost always fixed—they will spend the same amount of compute on search regardless of whether the best next move is extremely obvious or pivotal to the outcome of the game.

    This fixed budget doesn’t match human behavior; humans naturally spend more time analyzing critical or complex positions compared to simple ones. In Allie, we introduce a time-adaptive MCTS procedure that varies the amount of search based on Allie’s prediction of how long a human would think in each position. If Allie predicts a human would spend more time on a position, it performs more search iterations to better match human depth of analysis. To keep things simple, we just set

    How does Allie Play?

    To evaluate whether Allie is human-aligned, we evaluate its performance both on an offline dataset and online against real human players.

    Figure 2. Allie significantly outperforms pervious state-of-the-art methods. Adaptive-search enables matching human moves at expert levels.

    In offline games, Allie achieves state-of-the-art in move-matching accuracy (defined as the % of moves made that match real human moves). It also models how humans resign, and ponder very well.

    Figure 3: Allie’s time predictions are strongly correlated with ground-truth human time usage. In the figure, we show median and IQR of Allie’s think time for different amount of time spent by humans.
    Figure 4: Allie learns to assign reliable value estimates to board states by observing game outcomes alone. We report Pearson’s r correlation of value estimates by ALLIE and Stockfish with game outcomes.

    Another main insight of our paper is that adaptive search enables remarkable skill calibration against players across the skill spectrum. Against players from 1100 to 2500 Elo, the adaptive search variant of Allie has an average skill gap of only 49 Elo points. In other words, Allie (with adaptive search) wins about 50% of games against opponents that are both beginner and expert level. Notably, none of the other methods (even the non-adpative MCTS baseline) can match the strength of 2500 Elo players.

    Table 1: Adaptive search enables remarkable skill calibration. Mean and maximum skill calibration errors is measured by computed by binning human players into 200-Elo groups. We also report systems’ estimated performance against players at the lower and upper Elo ends of the skill spectrum.

    Limitations and Future Work

    Despite strong offline evaluation metrics and generally positive player feedback, Allie still exhibits occasional behaviors that feel non-humanlike. Players specifically noted Allie’s propensity toward late-game blunders and sometimes spending too much time pondering positions where there’s only one reasonable move. These observations suggest there’s still room to improve our understanding of how humans allocate cognitive resources during chess play.

    For future work, we identify several promising directions. First, our approach heavily relies on available human data, which is plentiful for fast time controls but more limited for classical chess with longer thinking time. Extending our approach to model human reasoning in slower games, where players make more accurate moves with deeper calculation, represents a significant challenge. With the recent interest in reasoning models that make use of test-time compute, we hope that our adaptive search technique can be applied to improving the efficiency of allocating a limited compute budget.

    If you are interested in learning more about this work, please checkout our ICLR paper, Human-Aligned Chess With a Bit of Search.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleDOOM: The Dark Ages gets a brand new trailer at Wrestlemania 41
    Next Article 5 things the Surface Pro 12 needs to finally beat the Apple iPad Pro in 2025

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    September 3, 2025
    Machine Learning

    Announcing the new cluster creation experience for Amazon SageMaker HyperPod

    September 3, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    Attackers Abuse Velociraptor Forensic Tool to Deploy Visual Studio Code for C2 Tunneling

    Development

    CVE-2025-49276 – Unfoldwp Blogmine PHP Remote File Inclusion Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    10 tips for designing epic ships and vehicles for concept art

    Web Development

    CVE-2025-6754 – “WordPress SEO Metrics Privilege Escalation”

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    Development

    Universal Design Principles Supporting Operable Content – Equitable Use

    April 17, 2025

    When designing operable digital and physical spaces in pharmacies, the principle of Equitable Use is…

    7 reasons The Division 2 is a game you should be playing in 2025

    June 5, 2025

    Best AI Tools for Freelancers in 2025 Boost Productivity & Profits

    June 27, 2025

    CVE-2025-8822 – Linksys RE Series Stack-Based Buffer Overflow Vulnerability

    August 10, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.