Checkmating One, by Using Many: Combining Mixture of Experts with MCTS to Improve in Chess
- URL: http://arxiv.org/abs/2401.16852v3
- Date: Tue, 17 Jun 2025 15:05:12 GMT
- Title: Checkmating One, by Using Many: Combining Mixture of Experts with MCTS to Improve in Chess
- Authors: Felix Helfenstein, Johannes Czech, Jannis Blüml, Max Eisel, Kristian Kersting,
- Abstract summary: We introduce M2CTS, a modular framework that adapts strategy dynamically based on game phase.<n>By routing decisions through specialized neural networks trained for each phase, M2CTS improves both computational efficiency and playing strength.<n>In experiments on chess, M2CTS achieves up to +122 Elo over standard single-model baselines.
- Score: 17.101742121345648
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In games like chess, strategy evolves dramatically across distinct phases - the opening, middlegame, and endgame each demand different forms of reasoning and decision-making. Yet, many modern chess engines rely on a single neural network to play the entire game uniformly, often missing opportunities to specialize. In this work, we introduce M2CTS, a modular framework that combines Mixture of Experts with Monte Carlo Tree Search to adapt strategy dynamically based on game phase. We explore three different methods for training the neural networks: Separated Learning, Staged Learning, and Weighted Learning. By routing decisions through specialized neural networks trained for each phase, M2CTS improves both computational efficiency and playing strength. In experiments on chess, M2CTS achieves up to +122 Elo over standard single-model baselines and shows promising generalization to multi-agent domains such as Pommerman. These results highlight how modular, phase-aware systems can better align with the structured nature of games and move us closer to human-like behavior in dividing a problem into many smaller units.
Related papers
- Predicting Chess Puzzle Difficulty with Transformers [0.0]
We present GlickFormer, a novel transformer-based architecture that predicts chess puzzle difficulty by approximating the Glicko-2 rating system.<n>The proposed model utilizes a modified ChessFormer backbone for spatial feature extraction and incorporates temporal information via factorized transformer techniques.<n>Results demonstrate GlickFormer's superior performance compared to the state-of-the-art ChessFormer baseline across multiple metrics.
arXiv Detail & Related papers (2024-10-14T20:39:02Z) - High-Performance Few-Shot Segmentation with Foundation Models: An Empirical Study [64.06777376676513]
We develop a few-shot segmentation (FSS) framework based on foundation models.
To be specific, we propose a simple approach to extract implicit knowledge from foundation models to construct coarse correspondence.
Experiments on two widely used datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-09-10T08:04:11Z) - Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts [75.85448576746373]
We propose a method of grouping and pruning similar experts to improve the model's parameter efficiency.
We validate the effectiveness of our method by pruning three state-of-the-art MoE architectures.
The evaluation shows that our method outperforms other model pruning methods on a range of natural language tasks.
arXiv Detail & Related papers (2024-07-12T17:25:02Z) - Noise-powered Multi-modal Knowledge Graph Representation Framework [52.95468915728721]
The rise of Multi-modal Pre-training highlights the necessity for a unified Multi-Modal Knowledge Graph representation learning framework.
We propose a novel SNAG method that utilizes a Transformer-based architecture equipped with modality-level noise masking.
Our approach achieves SOTA performance across a total of ten datasets, demonstrating its versatility.
arXiv Detail & Related papers (2024-03-11T15:48:43Z) - Neural Population Learning beyond Symmetric Zero-sum Games [52.20454809055356]
We introduce NeuPL-JPSRO, a neural population learning algorithm that benefits from transfer learning of skills and converges to a Coarse Correlated (CCE) of the game.
Our work shows that equilibrium convergent population learning can be implemented at scale and in generality.
arXiv Detail & Related papers (2024-01-10T12:56:24Z) - All by Myself: Learning Individualized Competitive Behaviour with a
Contrastive Reinforcement Learning optimization [57.615269148301515]
In a competitive game scenario, a set of agents have to learn decisions that maximize their goals and minimize their adversaries' goals at the same time.
We propose a novel model composed of three neural layers that learn a representation of a competitive game, learn how to map the strategy of specific opponents, and how to disrupt them.
Our experiments demonstrate that our model achieves better performance when playing against offline, online, and competitive-specific models, in particular when playing against the same opponent multiple times.
arXiv Detail & Related papers (2023-10-02T08:11:07Z) - Know your Enemy: Investigating Monte-Carlo Tree Search with Opponent
Models in Pommerman [14.668309037894586]
In combination with Reinforcement Learning, Monte-Carlo Tree Search has shown to outperform human grandmasters in games such as Chess, Shogi and Go.
We investigate techniques that transform general-sum multiplayer games into single-player and two-player games.
arXiv Detail & Related papers (2023-05-22T16:39:20Z) - Personalizing Federated Learning with Over-the-Air Computations [84.8089761800994]
Federated edge learning is a promising technology to deploy intelligence at the edge of wireless networks in a privacy-preserving manner.
Under such a setting, multiple clients collaboratively train a global generic model under the coordination of an edge server.
This paper presents a distributed training paradigm that employs analog over-the-air computation to address the communication bottleneck.
arXiv Detail & Related papers (2023-02-24T08:41:19Z) - Exploring Adaptive MCTS with TD Learning in miniXCOM [0.0]
In this work, we explore on-line adaptivity in Monte Carlo tree search (MCTS) without requiring pre-training.
We present MCTS-TD, an adaptive MCTS algorithm improved with temporal difference learning.
We demonstrate our new approach on the game miniXCOM, a popular commercial franchise consisting of several turn-based tactical games.
arXiv Detail & Related papers (2022-10-10T21:04:25Z) - No-Regret Learning in Time-Varying Zero-Sum Games [99.86860277006318]
Learning from repeated play in a fixed zero-sum game is a classic problem in game theory and online learning.
We develop a single parameter-free algorithm that simultaneously enjoys favorable guarantees under three performance measures.
Our algorithm is based on a two-layer structure with a meta-algorithm learning over a group of black-box base-learners satisfying a certain property.
arXiv Detail & Related papers (2022-01-30T06:10:04Z) - An Expectation-Maximization Perspective on Federated Learning [75.67515842938299]
Federated learning describes the distributed training of models across multiple clients while keeping the data private on-device.
In this work, we view the server-orchestrated federated learning process as a hierarchical latent variable model where the server provides the parameters of a prior distribution over the client-specific model parameters.
We show that with simple Gaussian priors and a hard version of the well known Expectation-Maximization (EM) algorithm, learning in such a model corresponds to FedAvg, the most popular algorithm for the federated learning setting.
arXiv Detail & Related papers (2021-11-19T12:58:59Z) - An Approach for Combining Multimodal Fusion and Neural Architecture
Search Applied to Knowledge Tracing [6.540879944736641]
We propose a sequential model based optimization approach that combines multimodal fusion and neural architecture search within one framework.
We evaluate our methods on two public real datasets showing the discovered model is able to achieve superior performance.
arXiv Detail & Related papers (2021-11-08T13:43:46Z) - Discovering Multi-Agent Auto-Curricula in Two-Player Zero-Sum Games [31.97631243571394]
We introduce a framework, LMAC, that automates the discovery of the update rule without explicit human design.
Surprisingly, even without human design, the discovered MARL algorithms achieve competitive or even better performance.
We show that LMAC is able to generalise from small games to large games, for example training on Kuhn Poker and outperforming PSRO.
arXiv Detail & Related papers (2021-06-04T22:30:25Z) - Mixture of ELM based experts with trainable gating network [2.320417845168326]
We propose an ensemble learning method based on mixture of experts.
The structure of ME consists of multi layer perceptrons (MLPs) as base experts and gating network.
In the proposed method a trainable gating network is applied to aggregate the outputs of the experts.
arXiv Detail & Related papers (2021-05-25T07:13:35Z) - Dual Monte Carlo Tree Search [0.0]
We show that Dual MCTS performs better than one of the most widely used neural MCTS algorithms, AlphaZero, for various symmetric and asymmetric games.
Dual MCTS uses two different search trees, a single deep neural network, and a new update technique for the search trees using a combination of the PUCB, a sliding-window, and the epsilon-greedy algorithm.
arXiv Detail & Related papers (2021-03-21T23:34:11Z) - Model-Based Machine Learning for Communications [110.47840878388453]
We review existing strategies for combining model-based algorithms and machine learning from a high level perspective.
We focus on symbol detection, which is one of the fundamental tasks of communication receivers.
arXiv Detail & Related papers (2021-01-12T19:55:34Z) - Nested Mixture of Experts: Cooperative and Competitive Learning of
Hybrid Dynamical System [2.055949720959582]
We propose a nested mixture of experts (NMOE) for representing and learning hybrid dynamical systems.
An NMOE combines both white-box and black-box models while optimizing bias-variance trade-off.
An NMOE provides a structured method for incorporating various types of prior knowledge by training the associative experts cooperatively or competitively.
arXiv Detail & Related papers (2020-11-20T19:36:45Z) - Reinforcement Learning for Variable Selection in a Branch and Bound
Algorithm [0.10499611180329801]
We leverage patterns in real-world instances to learn from scratch a new branching strategy optimised for a given problem.
We propose FMSTS, a novel Reinforcement Learning approach specifically designed for this task.
arXiv Detail & Related papers (2020-05-20T13:15:48Z) - Neural MMO v1.3: A Massively Multiagent Game Environment for Training
and Evaluating Neural Networks [48.5733173329785]
We present Neural MMO, a massively multiagent game environment inspired by MMOs.
We discuss our progress on two more general challenges in multiagent systems engineering for AI research: distributed infrastructure and game IO.
arXiv Detail & Related papers (2020-01-31T18:50:02Z) - Smooth markets: A basic mechanism for organizing gradient-based learners [47.34060971879986]
We introduce smooth markets (SM-games), a class of n-player games with pairwise zero sum interactions.
SM-games codify a common design pattern in machine learning that includes (some) GANs, adversarial training, and other recent algorithms.
We show that SM-games are amenable to analysis and optimization using first-order methods.
arXiv Detail & Related papers (2020-01-14T09:19:39Z) - Unpaired Multi-modal Segmentation via Knowledge Distillation [77.39798870702174]
We propose a novel learning scheme for unpaired cross-modality image segmentation.
In our method, we heavily reuse network parameters, by sharing all convolutional kernels across CT and MRI.
We have extensively validated our approach on two multi-class segmentation problems.
arXiv Detail & Related papers (2020-01-06T20:03:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.