Amortized Planning with Large-Scale Transformers: A Case Study on Chess
- URL: http://arxiv.org/abs/2402.04494v2
- Date: Mon, 21 Oct 2024 09:37:12 GMT
- Title: Amortized Planning with Large-Scale Transformers: A Case Study on Chess
- Authors: Anian Ruoss, Grégoire Delétang, Sourabh Medapati, Jordi Grau-Moya, Li Kevin Wenliang, Elliot Catt, John Reid, Cannada A. Lewis, Joel Veness, Tim Genewein,
- Abstract summary: This paper uses chess, a landmark planning problem in AI, to assess performance on a planning task.
ChessBench is a large-scale benchmark of 10 million chess games with legal move and value annotations (15 billion points) provided by Stockfish.
We show that, although a remarkably good approximation can be distilled into large-scale transformers via supervised learning, perfect distillation is still beyond reach.
- Score: 11.227110138932442
- License:
- Abstract: This paper uses chess, a landmark planning problem in AI, to assess transformers' performance on a planning task where memorization is futile $\unicode{x2013}$ even at a large scale. To this end, we release ChessBench, a large-scale benchmark dataset of 10 million chess games with legal move and value annotations (15 billion data points) provided by Stockfish 16, the state-of-the-art chess engine. We train transformers with up to 270 million parameters on ChessBench via supervised learning and perform extensive ablations to assess the impact of dataset size, model size, architecture type, and different prediction targets (state-values, action-values, and behavioral cloning). Our largest models learn to predict action-values for novel boards quite accurately, implying highly non-trivial generalization. Despite performing no explicit search, our resulting chess policy solves challenging chess puzzles and achieves a surprisingly strong Lichess blitz Elo of 2895 against humans (grandmaster level). We also compare to Leela Chess Zero and AlphaZero (trained without supervision via self-play) with and without search. We show that, although a remarkably good approximation of Stockfish's search-based algorithm can be distilled into large-scale transformers via supervised learning, perfect distillation is still beyond reach, thus making ChessBench well-suited for future research.
Related papers
- Predicting Chess Puzzle Difficulty with Transformers [0.0]
We present GlickFormer, a novel transformer-based architecture that predicts chess puzzle difficulty by approximating the Glicko-2 rating system.
The proposed model utilizes a modified ChessFormer backbone for spatial feature extraction and incorporates temporal information via factorized transformer techniques.
Results demonstrate GlickFormer's superior performance compared to the state-of-the-art ChessFormer baseline across multiple metrics.
arXiv Detail & Related papers (2024-10-14T20:39:02Z) - End-to-End Chess Recognition [11.15543089335477]
Current approaches use a pipeline of separate, independent, modules such as chessboard detection, square localization, and piece classification.
We explore an end-to-end approach to directly predict the configuration from the image, thus avoiding the error accumulation of the sequential approaches.
In contrast to existing datasets that are synthetically rendered and have only limited angles, ChessReD has photographs captured from various angles using smartphone cameras.
Our approach in chess recognition on the introduced challenging benchmark dataset outperforms related approaches, successfully recognizing the chess pieces' configuration in 15.26% of ChessReD's test images.
arXiv Detail & Related papers (2023-10-06T08:30:20Z) - Targeted Search Control in AlphaZero for Effective Policy Improvement [93.30151539224144]
We introduce Go-Exploit, a novel search control strategy for AlphaZero.
Go-Exploit samples the start state of its self-play trajectories from an archive of states of interest.
Go-Exploit learns with a greater sample efficiency than standard AlphaZero.
arXiv Detail & Related papers (2023-02-23T22:50:24Z) - Are AlphaZero-like Agents Robust to Adversarial Perturbations? [73.13944217915089]
AlphaZero (AZ) has demonstrated that neural-network-based Go AIs can surpass human performance by a large margin.
We ask whether adversarial states exist for Go AIs that may lead them to play surprisingly wrong actions.
We develop the first adversarial attack on Go AIs that can efficiently search for adversarial states by strategically reducing the search space.
arXiv Detail & Related papers (2022-11-07T18:43:25Z) - Mastering the Game of Stratego with Model-Free Multiagent Reinforcement
Learning [86.37438204416435]
Stratego is one of the few iconic board games that Artificial Intelligence (AI) has not yet mastered.
Decisions in Stratego are made over a large number of discrete actions with no obvious link between action and outcome.
DeepNash beats existing state-of-the-art AI methods in Stratego and achieved a yearly (2022) and all-time top-3 rank on the Gravon games platform.
arXiv Detail & Related papers (2022-06-30T15:53:19Z) - Measuring the Non-Transitivity in Chess [19.618609913302855]
We quantify the non-transitivity in Chess through real-world data from human players.
There exists a strong connection between the degree of non-transitivity and the progression of a Chess player's rating.
arXiv Detail & Related papers (2021-10-22T12:15:42Z) - Determining Chess Game State From an Image [19.06796946564999]
This paper puts forth a new dataset synthesised from a 3D model that is an order of magnitude larger than existing ones.
A novel end-to-end chess recognition system is presented that combines traditional computer vision techniques with deep learning.
The described system achieves an error rate of 0.23% per square on the test set, 28 times better than the current state of the art.
arXiv Detail & Related papers (2021-04-30T13:02:13Z) - Learning Chess Blindfolded: Evaluating Language Models on State Tracking [69.3794549747725]
We consider the task of language modeling for the game of chess.
Unlike natural language, chess notations describe a simple, constrained, and deterministic domain.
We find that transformer language models can learn to track pieces and predict legal moves with high accuracy when trained solely on move sequences.
arXiv Detail & Related papers (2021-02-26T01:16:23Z) - Learning to Play Imperfect-Information Games by Imitating an Oracle
Planner [77.67437357688316]
We consider learning to play multiplayer imperfect-information games with simultaneous moves and large state-action spaces.
Our approach is based on model-based planning.
We show that the planner is able to discover efficient playing strategies in the games of Clash Royale and Pommerman.
arXiv Detail & Related papers (2020-12-22T17:29:57Z) - LiveChess2FEN: a Framework for Classifying Chess Pieces based on CNNs [0.0]
We have implemented a functional framework that automatically digitizes a chess position from an image in less than 1 second.
We have analyzed different Convolutional Neural Networks for chess piece classification and how to map them efficiently on our embedded platform.
arXiv Detail & Related papers (2020-12-12T16:48:40Z) - Learning to Play Sequential Games versus Unknown Opponents [93.8672371143881]
We consider a repeated sequential game between a learner, who plays first, and an opponent who responds to the chosen action.
We propose a novel algorithm for the learner when playing against an adversarial sequence of opponents.
Our results include algorithm's regret guarantees that depend on the regularity of the opponent's response.
arXiv Detail & Related papers (2020-07-10T09:33:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.