Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      June 6, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      June 6, 2025

      How To Prevent WordPress SQL Injection Attacks

      June 6, 2025

      In MCP era API discoverability is now more important than ever

      June 5, 2025

      Black Myth: Wukong is coming to Xbox exactly one year after launching on PlayStation

      June 6, 2025

      Reddit wants to sue Anthropic for stealing its data, but the Claude AI manufacturers vow to “defend ourselves vigorously”

      June 6, 2025

      Satya Nadella says Microsoft makes money every time you use ChatGPT: “Every day that ChatGPT succeeds is a fantastic day”

      June 6, 2025

      Multiple reports suggest a Persona 4 Remake from Atlus will be announced during the Xbox Games Showcase

      June 6, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      TC39 advances numerous proposals at latest meeting

      June 6, 2025
      Recent

      TC39 advances numerous proposals at latest meeting

      June 6, 2025

      TypeBridge – zero ceremony, compile time rpc for client and server com

      June 6, 2025

      Simplify Cloud-Native Development with Quarkus Extensions

      June 6, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Black Myth: Wukong is coming to Xbox exactly one year after launching on PlayStation

      June 6, 2025
      Recent

      Black Myth: Wukong is coming to Xbox exactly one year after launching on PlayStation

      June 6, 2025

      Reddit wants to sue Anthropic for stealing its data, but the Claude AI manufacturers vow to “defend ourselves vigorously”

      June 6, 2025

      Satya Nadella says Microsoft makes money every time you use ChatGPT: “Every day that ChatGPT succeeds is a fantastic day”

      June 6, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Allie: A Human-Aligned Chess Bot

    Allie: A Human-Aligned Chess Bot

    April 21, 2025
    Allie: A Human-Aligned Chess Bot

    Play against Allie on lichess!

    Introduction

    In 1948, Alan Turning designed what might be the first chess playing AI, a paper program that Turing himself acted as the computer for. Since then, chess has been a testbed for nearly every generation of AI advancement. After decades of improvement, today’s top chess engines like Stockfish and AlphaZero have far surpassed the capabilities of even the strongest human grandmasters.

    However, most chess players are not grandmasters, and these state-of-the-art Chess AIs have been described as playing more like aliens than fellow humans.

    The core problem here is that strong AI systems are not human-aligned; they are unable to match the diversity of skill levels of human partners and unable to model human-like behaviors beyond piece movement. Understanding how to make AI systems that can effectively collaborate with and be overseen by humans is a key challenge in AI alignment. Chess provides an ideal testbed for trying out new ideas towards this goal – while modern chess engines far surpass human ability, they are completely incapable of playing in a human-like way or adapting to match their human opponents’ skill levels. In this paper, we introduce Allie, a chess-playing AI designed to bridge the gap between artificial and human intelligence in this classic game.

    What is Human-aligned Chess?

    When we talk about “human-aligned” chess AI, what exactly do we mean? At its core, we want a system that is both humanlike, defined as making moves that feel natural to human players, as well as skill-calibrated, defined as capable of playing at a similar level against human opponents across the skill spectrum.

    Our goal here is quite different from traditional chess engines like Stockfish or AlphaZero, which are optimized solely to play the strongest moves possible. While these engines achieve superhuman performance, their play can feel alien to humans. They may instantly make moves in complex positions where humans would need time to think, or continue playing in completely lost positions where humans would normally resign.

    Building Allie

    Allie's system design
    Figure 1: (a) A game state is represented as the sequence of moves that produced it and some metadata. This sequence is inputted to a Transformer, which predicts the next move, pondering time for this move, and a value assessment of the move. (b) At inference time, we employee Monte-Carlo Tree Search with the value predictions from the model. The number of rollouts (N_mathrm{sim}) is chosen dynamically based on the predicted pondering time.

    A Transformer model trained on transcripts of real games

    While most prior deep learning approaches build models that input a board state, and output a distribution over possible moves, we instead approach chess like a language modeling task. We use a Transformer architecture that inputs a sequence of moves rather than a single board state. Just as large language models learn to generate human-like text by training on vast text corpora, we hypothesized that a similar architecture could learn human-like chess by training on human game records. We train our chess “language” model on transcripts of over 93M games encompassing a total of 6.6 billion moves, which were played on the chess website Lichess.

    Conditioning on Elo score

    In chess, Elo scores normally fall in the range of 500 (beginner players) to 3000 (top chess professionals). To calibrate the playing strength of ALLIE to different levels of players, we model gameplay under a conditional generation framework, where encodings of the Elo ratings of both players are prepended to the game sequence. Specifically, we prefix each game with soft control tokens, which interpolate between a weak token, representing 500 Elo, and a strong token, representing 3000 Elo.

    For a player with Elo rating (k), we compute a soft token (e_k) by linearly interpolating between the weak and strong tokens:

    $$e_k = gamma e_text{weak} + (1-gamma) e_text{strong}$$

    where (gamma = frac{3000-k}{2500}). During training, we prefix each game with two soft tokens corresponding to the two players’ strengths.

    Learning objectives

    On top of the base Transformer model, Allie has three prediction objectives:

    1. A policy head (p_theta) that outputs a probability distribution over possible next moves
    2. A pondering-time head (t_theta) that outputs the number of seconds a human player would take to come up with this move
    3. A value assessment head (v_theta) that outputs a scalar value representing who expects to win the game

    All three heads are individually parametrized as linear layers applied to the final hidden state of the decoder. Given a dataset of chess games, represented as a sequence of moves (mathbf{m}), human ponder time before each move (mathbf{t}), and game output (v) we trained Allie to minimize the log-likelihood of next moves and MSE of time and value predictions:

    $$mathcal{L}(theta) = sum_{(mathbf{m}, mathbf{t}, v) in mathcal{D}} left( sum_{1 le i le N} left( -log p_theta(m_i ,|, mathbf{m}_{lt i}) + left(t_theta(mathbf{m}_{lt i}) – t_iright)^2 + left(v_theta(mathbf{m}_{lt i}) – vright)^2 right) right) text{.}$$

    Adaptive Monte-Carlo Tree Search

    At play-time, traditional chess engines like AlphaZero use search algorithms such as Monte-Carlo Tree Search (MCTS) to anticipate many moves into the future, evaluating different possibilities for how the game might go. The search budget (N_mathrm{sim}) is almost always fixed—they will spend the same amount of compute on search regardless of whether the best next move is extremely obvious or pivotal to the outcome of the game.

    This fixed budget doesn’t match human behavior; humans naturally spend more time analyzing critical or complex positions compared to simple ones. In Allie, we introduce a time-adaptive MCTS procedure that varies the amount of search based on Allie’s prediction of how long a human would think in each position. If Allie predicts a human would spend more time on a position, it performs more search iterations to better match human depth of analysis. To keep things simple, we just set

    How does Allie Play?

    To evaluate whether Allie is human-aligned, we evaluate its performance both on an offline dataset and online against real human players.

    Figure 2. Allie significantly outperforms pervious state-of-the-art methods. Adaptive-search enables matching human moves at expert levels.

    In offline games, Allie achieves state-of-the-art in move-matching accuracy (defined as the % of moves made that match real human moves). It also models how humans resign, and ponder very well.

    Figure 3: Allie’s time predictions are strongly correlated with ground-truth human time usage. In the figure, we show median and IQR of Allie’s think time for different amount of time spent by humans.
    Figure 4: Allie learns to assign reliable value estimates to board states by observing game outcomes alone. We report Pearson’s r correlation of value estimates by ALLIE and Stockfish with game outcomes.

    Another main insight of our paper is that adaptive search enables remarkable skill calibration against players across the skill spectrum. Against players from 1100 to 2500 Elo, the adaptive search variant of Allie has an average skill gap of only 49 Elo points. In other words, Allie (with adaptive search) wins about 50% of games against opponents that are both beginner and expert level. Notably, none of the other methods (even the non-adpative MCTS baseline) can match the strength of 2500 Elo players.

    Table 1: Adaptive search enables remarkable skill calibration. Mean and maximum skill calibration errors is measured by computed by binning human players into 200-Elo groups. We also report systems’ estimated performance against players at the lower and upper Elo ends of the skill spectrum.

    Limitations and Future Work

    Despite strong offline evaluation metrics and generally positive player feedback, Allie still exhibits occasional behaviors that feel non-humanlike. Players specifically noted Allie’s propensity toward late-game blunders and sometimes spending too much time pondering positions where there’s only one reasonable move. These observations suggest there’s still room to improve our understanding of how humans allocate cognitive resources during chess play.

    For future work, we identify several promising directions. First, our approach heavily relies on available human data, which is plentiful for fast time controls but more limited for classical chess with longer thinking time. Extending our approach to model human reasoning in slower games, where players make more accurate moves with deeper calculation, represents a significant challenge. With the recent interest in reasoning models that make use of test-time compute, we hope that our adaptive search technique can be applied to improving the efficiency of allocating a limited compute budget.

    If you are interested in learning more about this work, please checkout our ICLR paper, Human-Aligned Chess With a Bit of Search.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleDOOM: The Dark Ages gets a brand new trailer at Wrestlemania 41
    Next Article 5 things the Surface Pro 12 needs to finally beat the Apple iPad Pro in 2025

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    June 6, 2025
    Machine Learning

    Teaching AI to Say ‘I Don’t Know’: A New Dataset Mitigates Hallucinations from Reinforcement Finetuning

    June 6, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    CVE-2025-3645 – Moodle Information Disclosure Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-5272 – Firefox Memory Corruption Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-39498 – Spotlight Spotlight Social Media Feeds (Premium) Sensitive Data Injection

    Common Vulnerabilities and Exposures (CVEs)

    The Razer Viper V2 Pro, one of the best lightweight gaming mice, is now on sale for less than $90

    News & Updates

    Highlights

    Development

    FSB Uses Trojan App to Monitor Russian Programmer Accused of Supporting Ukraine

    December 7, 2024

    A Russian programmer accused of donating money to Ukraine had his Android device secretly implanted…

    CVE-2025-23377 – Dell PowerProtect Data Manager Cross-Site Scripting (XSS)

    April 28, 2025

    DragonFly BSD is a UNIX-like operating system forked from FreeBSD

    May 30, 2025

    Zero-Day Alert: Google Releases Chrome Patch for Exploit Used in Russian Espionage Attacks

    March 26, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.