Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      In-House vs. Outsource Node.js Development Teams: 9 Key Differences for the C-Suite (2025)

      July 19, 2025

      Why Non-Native Content Designers Improve Global UX

      July 18, 2025

      DevOps won’t scale without platform engineering and here’s why your teams are still stuck

      July 18, 2025

      This week in AI dev tools: Slack’s enterprise search, Claude Code’s analytics dashboard, and more (July 18, 2025)

      July 18, 2025

      DistroWatch Weekly, Issue 1131

      July 20, 2025

      I ditched my Bluetooth speakers for this slick turntable – and it’s more practical than I thought

      July 19, 2025

      This split keyboard offers deep customization – if you’re willing to go all in

      July 19, 2025

      I spoke with an AI version of myself, thanks to Hume’s free tool – how to try it

      July 19, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The details of TC39’s last meeting

      July 20, 2025
      Recent

      The details of TC39’s last meeting

      July 20, 2025

      Simple wrapper for Chrome’s built-in local LLM (Gemini Nano)

      July 19, 2025

      Online Examination System using PHP and MySQL

      July 18, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Windows 11 tests “shared audio” to play music via multiple devices, new taskbar animations

      July 20, 2025
      Recent

      Windows 11 tests “shared audio” to play music via multiple devices, new taskbar animations

      July 20, 2025

      WhatsApp for Windows 11 is switching back to Chromium web wrapper from UWP/native

      July 20, 2025

      DistroWatch Weekly, Issue 1131

      July 20, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Allie: A Human-Aligned Chess Bot

    Allie: A Human-Aligned Chess Bot

    April 21, 2025
    Allie: A Human-Aligned Chess Bot

    Play against Allie on lichess!

    Introduction

    In 1948, Alan Turning designed what might be the first chess playing AI, a paper program that Turing himself acted as the computer for. Since then, chess has been a testbed for nearly every generation of AI advancement. After decades of improvement, today’s top chess engines like Stockfish and AlphaZero have far surpassed the capabilities of even the strongest human grandmasters.

    However, most chess players are not grandmasters, and these state-of-the-art Chess AIs have been described as playing more like aliens than fellow humans.

    The core problem here is that strong AI systems are not human-aligned; they are unable to match the diversity of skill levels of human partners and unable to model human-like behaviors beyond piece movement. Understanding how to make AI systems that can effectively collaborate with and be overseen by humans is a key challenge in AI alignment. Chess provides an ideal testbed for trying out new ideas towards this goal – while modern chess engines far surpass human ability, they are completely incapable of playing in a human-like way or adapting to match their human opponents’ skill levels. In this paper, we introduce Allie, a chess-playing AI designed to bridge the gap between artificial and human intelligence in this classic game.

    What is Human-aligned Chess?

    When we talk about “human-aligned” chess AI, what exactly do we mean? At its core, we want a system that is both humanlike, defined as making moves that feel natural to human players, as well as skill-calibrated, defined as capable of playing at a similar level against human opponents across the skill spectrum.

    Our goal here is quite different from traditional chess engines like Stockfish or AlphaZero, which are optimized solely to play the strongest moves possible. While these engines achieve superhuman performance, their play can feel alien to humans. They may instantly make moves in complex positions where humans would need time to think, or continue playing in completely lost positions where humans would normally resign.

    Building Allie

    Allie's system design
    Figure 1: (a) A game state is represented as the sequence of moves that produced it and some metadata. This sequence is inputted to a Transformer, which predicts the next move, pondering time for this move, and a value assessment of the move. (b) At inference time, we employee Monte-Carlo Tree Search with the value predictions from the model. The number of rollouts (N_mathrm{sim}) is chosen dynamically based on the predicted pondering time.

    A Transformer model trained on transcripts of real games

    While most prior deep learning approaches build models that input a board state, and output a distribution over possible moves, we instead approach chess like a language modeling task. We use a Transformer architecture that inputs a sequence of moves rather than a single board state. Just as large language models learn to generate human-like text by training on vast text corpora, we hypothesized that a similar architecture could learn human-like chess by training on human game records. We train our chess “language” model on transcripts of over 93M games encompassing a total of 6.6 billion moves, which were played on the chess website Lichess.

    Conditioning on Elo score

    In chess, Elo scores normally fall in the range of 500 (beginner players) to 3000 (top chess professionals). To calibrate the playing strength of ALLIE to different levels of players, we model gameplay under a conditional generation framework, where encodings of the Elo ratings of both players are prepended to the game sequence. Specifically, we prefix each game with soft control tokens, which interpolate between a weak token, representing 500 Elo, and a strong token, representing 3000 Elo.

    For a player with Elo rating (k), we compute a soft token (e_k) by linearly interpolating between the weak and strong tokens:

    $$e_k = gamma e_text{weak} + (1-gamma) e_text{strong}$$

    where (gamma = frac{3000-k}{2500}). During training, we prefix each game with two soft tokens corresponding to the two players’ strengths.

    Learning objectives

    On top of the base Transformer model, Allie has three prediction objectives:

    1. A policy head (p_theta) that outputs a probability distribution over possible next moves
    2. A pondering-time head (t_theta) that outputs the number of seconds a human player would take to come up with this move
    3. A value assessment head (v_theta) that outputs a scalar value representing who expects to win the game

    All three heads are individually parametrized as linear layers applied to the final hidden state of the decoder. Given a dataset of chess games, represented as a sequence of moves (mathbf{m}), human ponder time before each move (mathbf{t}), and game output (v) we trained Allie to minimize the log-likelihood of next moves and MSE of time and value predictions:

    $$mathcal{L}(theta) = sum_{(mathbf{m}, mathbf{t}, v) in mathcal{D}} left( sum_{1 le i le N} left( -log p_theta(m_i ,|, mathbf{m}_{lt i}) + left(t_theta(mathbf{m}_{lt i}) – t_iright)^2 + left(v_theta(mathbf{m}_{lt i}) – vright)^2 right) right) text{.}$$

    Adaptive Monte-Carlo Tree Search

    At play-time, traditional chess engines like AlphaZero use search algorithms such as Monte-Carlo Tree Search (MCTS) to anticipate many moves into the future, evaluating different possibilities for how the game might go. The search budget (N_mathrm{sim}) is almost always fixed—they will spend the same amount of compute on search regardless of whether the best next move is extremely obvious or pivotal to the outcome of the game.

    This fixed budget doesn’t match human behavior; humans naturally spend more time analyzing critical or complex positions compared to simple ones. In Allie, we introduce a time-adaptive MCTS procedure that varies the amount of search based on Allie’s prediction of how long a human would think in each position. If Allie predicts a human would spend more time on a position, it performs more search iterations to better match human depth of analysis. To keep things simple, we just set

    How does Allie Play?

    To evaluate whether Allie is human-aligned, we evaluate its performance both on an offline dataset and online against real human players.

    Figure 2. Allie significantly outperforms pervious state-of-the-art methods. Adaptive-search enables matching human moves at expert levels.

    In offline games, Allie achieves state-of-the-art in move-matching accuracy (defined as the % of moves made that match real human moves). It also models how humans resign, and ponder very well.

    Figure 3: Allie’s time predictions are strongly correlated with ground-truth human time usage. In the figure, we show median and IQR of Allie’s think time for different amount of time spent by humans.
    Figure 4: Allie learns to assign reliable value estimates to board states by observing game outcomes alone. We report Pearson’s r correlation of value estimates by ALLIE and Stockfish with game outcomes.

    Another main insight of our paper is that adaptive search enables remarkable skill calibration against players across the skill spectrum. Against players from 1100 to 2500 Elo, the adaptive search variant of Allie has an average skill gap of only 49 Elo points. In other words, Allie (with adaptive search) wins about 50% of games against opponents that are both beginner and expert level. Notably, none of the other methods (even the non-adpative MCTS baseline) can match the strength of 2500 Elo players.

    Table 1: Adaptive search enables remarkable skill calibration. Mean and maximum skill calibration errors is measured by computed by binning human players into 200-Elo groups. We also report systems’ estimated performance against players at the lower and upper Elo ends of the skill spectrum.

    Limitations and Future Work

    Despite strong offline evaluation metrics and generally positive player feedback, Allie still exhibits occasional behaviors that feel non-humanlike. Players specifically noted Allie’s propensity toward late-game blunders and sometimes spending too much time pondering positions where there’s only one reasonable move. These observations suggest there’s still room to improve our understanding of how humans allocate cognitive resources during chess play.

    For future work, we identify several promising directions. First, our approach heavily relies on available human data, which is plentiful for fast time controls but more limited for classical chess with longer thinking time. Extending our approach to model human reasoning in slower games, where players make more accurate moves with deeper calculation, represents a significant challenge. With the recent interest in reasoning models that make use of test-time compute, we hope that our adaptive search technique can be applied to improving the efficiency of allocating a limited compute budget.

    If you are interested in learning more about this work, please checkout our ICLR paper, Human-Aligned Chess With a Bit of Search.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleDOOM: The Dark Ages gets a brand new trailer at Wrestlemania 41
    Next Article 5 things the Surface Pro 12 needs to finally beat the Apple iPad Pro in 2025

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    July 20, 2025
    Machine Learning

    Language Models Improve When Pretraining Data Matches Target Tasks

    July 18, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    CVE-2025-2718 – CVE-2018-3627: Adobe Flash Player Cross-Site Scripting

    Common Vulnerabilities and Exposures (CVEs)

    Managing Providers in Terraform: AWS, Azure, and GCP

    Linux

    Gemini 2.0 is now available to everyone

    Artificial Intelligence

    U.S. House Bans WhatsApp on Official Devices Over Security and Data Protection Issues

    Development

    Highlights

    CVE-2025-6697 – LabRedesCefetRJ WeGIA Cross Site Scripting Vulnerability

    June 26, 2025

    CVE ID : CVE-2025-6697

    Published : June 26, 2025, 3:15 p.m. | 1 hour, 51 minutes ago

    Description : A vulnerability was found in LabRedesCefetRJ WeGIA 3.4.0. It has been declared as problematic. Affected by this vulnerability is an unknown functionality of the file /html/matPat/adicionar_tipoEntrada.php of the component Adicionar tipo. The manipulation of the argument Insira o novo tipo leads to cross site scripting. The attack can be launched remotely. The exploit has been disclosed to the public and may be used. The vendor was contacted early about this disclosure but did not respond in any way.

    Severity: 3.5 | LOW

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    Windows WEBDAV 0-Day RCE Vulnerability Actively Exploited in the Wild – All Versions Affected

    June 10, 2025

    Chat 4O – AI Image Generator & Assistant with GPT-4o & O1

    April 1, 2025

    CVE-2025-20677 – “Qualcomm Bluetooth Local Denial of Service Vulnerability”

    June 2, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.