Superior Computer Chess with Model Predictive Control, Reinforcement Learning, and Rollout
- URL: http://arxiv.org/abs/2409.06477v1
- Date: Tue, 10 Sep 2024 13:05:45 GMT
- Title: Superior Computer Chess with Model Predictive Control, Reinforcement Learning, and Rollout
- Authors: Atharva Gundawar, Yuchao Li, Dimitri Bertsekas,
- Abstract summary: We introduce a new architecture for move selection, within which available chess engines are used as components.
One engine is used to provide position evaluations in an approximation in value space MPC/RL scheme, while a second engine is used as nominal opponent.
We show that our architecture improves substantially the performance of the position evaluation engine.
- Score: 2.68187684471817
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we apply model predictive control (MPC), rollout, and reinforcement learning (RL) methodologies to computer chess. We introduce a new architecture for move selection, within which available chess engines are used as components. One engine is used to provide position evaluations in an approximation in value space MPC/RL scheme, while a second engine is used as nominal opponent, to emulate or approximate the moves of the true opponent player. We show that our architecture improves substantially the performance of the position evaluation engine. In other words our architecture provides an additional layer of intelligence, on top of the intelligence of the engines on which it is based. This is true for any engine, regardless of its strength: top engines such as Stockfish and Komodo Dragon (of varying strengths), as well as weaker engines. Structurally, our basic architecture selects moves by a one-move lookahead search, with an intermediate move generated by a nominal opponent engine, and followed by a position evaluation by another chess engine. Simpler schemes that forego the use of the nominal opponent, also perform better than the position evaluator, but not quite by as much. More complex schemes, involving multistep lookahead, may also be used and generally tend to perform better as the length of the lookahead increases. Theoretically, our methodology relies on generic cost improvement properties and the superlinear convergence framework of Newton's method, which fundamentally underlies approximation in value space, and related MPC/RL and rollout/policy iteration schemes. A critical requirement of this framework is that the first lookahead step should be executed exactly. This fact has guided our architectural choices, and is apparently an important factor in improving the performance of even the best available chess engines.
Related papers
- Enhancing Chess Reinforcement Learning with Graph Representation [21.919003715442074]
We introduce a more general architecture based on Graph Neural Networks (GNN)
We show that this new architecture outperforms previous architectures with a similar number of parameters.
We also show that the model, when trained on a smaller $5times 5$ variant of chess, is able to be quickly fine-tuned to play on regular $8times 8$ chess.
arXiv Detail & Related papers (2024-10-31T09:18:47Z) - Mastering Chess with a Transformer Model [0.0]
We show that transformers endowed with a sufficiently expressive position representation can match existing chess-playing models at a fraction of the computational cost.
Our architecture, which we call the Chessformer, significantly outperforms AlphaZero in both playing strength and puzzle solving ability with 8x less computation.
arXiv Detail & Related papers (2024-09-18T19:05:21Z) - Evolving Virtual World with Delta-Engine [60.488864128937955]
We propose a special engine called textemphDelta-Engine to drive this virtual world.
The key feature of the delta-engine is its scalability to unknown elements within the world.
arXiv Detail & Related papers (2024-08-11T18:32:29Z) - Predicting User Perception of Move Brilliance in Chess [3.434553688053531]
We show the first system for classifying chess moves as brilliant.
The system achieves an accuracy of 79% (with 50% base-rate), a PPV of 83%, and an NPV of 75%.
We show that a move is more likely to be predicted as brilliant, all things being equal, if a weaker engine considers it lower-quality.
arXiv Detail & Related papers (2024-06-14T17:46:26Z) - Hierarchical Empowerment: Towards Tractable Empowerment-Based Skill
Learning [65.41865750258775]
General purpose agents will require large repertoires of skills.
We introduce a new framework, Hierarchical Empowerment, that makes computing empowerment more tractable.
In a popular ant navigation domain, our four level agents are able to learn skills that cover a surface area over two orders of magnitude larger than prior work.
arXiv Detail & Related papers (2023-07-06T02:27:05Z) - Comparison Analysis of Traditional Machine Learning and Deep Learning
Techniques for Data and Image Classification [62.997667081978825]
The purpose of the study is to analyse and compare the most common machine learning and deep learning techniques used for computer vision 2D object classification tasks.
Firstly, we will present the theoretical background of the Bag of Visual words model and Deep Convolutional Neural Networks (DCNN)
Secondly, we will implement a Bag of Visual Words model, the VGG16 CNN Architecture.
arXiv Detail & Related papers (2022-04-11T11:34:43Z) - Meta Mirror Descent: Optimiser Learning for Fast Convergence [85.98034682899855]
We take a different perspective starting from mirror descent rather than gradient descent, and meta-learning the corresponding Bregman divergence.
Within this paradigm, we formalise a novel meta-learning objective of minimising the regret bound of learning.
Unlike many meta-learned optimisers, it also supports convergence and generalisation guarantees and uniquely does so without requiring validation data.
arXiv Detail & Related papers (2022-03-05T11:41:13Z) - iDARTS: Differentiable Architecture Search with Stochastic Implicit
Gradients [75.41173109807735]
Differentiable ARchiTecture Search (DARTS) has recently become the mainstream of neural architecture search (NAS)
We tackle the hypergradient computation in DARTS based on the implicit function theorem.
We show that the architecture optimisation with the proposed method, named iDARTS, is expected to converge to a stationary point.
arXiv Detail & Related papers (2021-06-21T00:44:11Z) - LiveChess2FEN: a Framework for Classifying Chess Pieces based on CNNs [0.0]
We have implemented a functional framework that automatically digitizes a chess position from an image in less than 1 second.
We have analyzed different Convolutional Neural Networks for chess piece classification and how to map them efficiently on our embedded platform.
arXiv Detail & Related papers (2020-12-12T16:48:40Z) - Playing Chess with Limited Look Ahead [0.0]
We train a deep neural network to serve as a static evaluation function.
We show that our static evaluation function has encoded some semblance of look ahead knowledge.
We show that, despite strict restrictions on look ahead depth, our engine recommends moves of equal strength in roughly $83%$ of our sample positions.
arXiv Detail & Related papers (2020-07-04T16:02:43Z) - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch [76.83052807776276]
We show that it is possible to automatically discover complete machine learning algorithms just using basic mathematical operations as building blocks.
We demonstrate this by introducing a novel framework that significantly reduces human bias through a generic search space.
We believe these preliminary successes in discovering machine learning algorithms from scratch indicate a promising new direction in the field.
arXiv Detail & Related papers (2020-03-06T19:00:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.