Related papers: Mixture of Masters: Sparse Chess Language Models with Player Routing

Mixture of Masters: Sparse Chess Language Models with Player Routing

URL: http://arxiv.org/abs/2602.04447v1
Date: Wed, 04 Feb 2026 11:18:43 GMT
Title: Mixture of Masters: Sparse Chess Language Models with Player Routing
Authors: Giacomo Frisoni, Lorenzo Molfetta, Davide Freddi, Gianluca Moro,
Abstract summary: We introduce MoM, the first chess mixture-of-experts model with small-sized GPT experts emulating world-class grandmasters.<n>MoM is trained with a combination of self-supervised learning and reinforcement learning guided by chess-specific rewards.<n>When evaluated against Stockfish on unseen standard games, MoM outperforms both dense individual expert networks and popular GPT baselines.
Score: 11.12925453015974
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modern chess language models are dense transformers trained on millions of games played by thousands of high-rated individuals. However, these monolithic networks tend to collapse into mode-averaged behavior, where stylistic boundaries are blurred, and rare but effective strategies are suppressed. To counteract homogenization, we introduce Mixture-of-Masters (MoM), the first chess mixture-of-experts model with small-sized GPT experts emulating world-class grandmasters. Each expert is trained with a combination of self-supervised learning and reinforcement learning guided by chess-specific rewards. For each move, a post-hoc learnable gating network selects the most appropriate persona to channel depending on the game state, allowing MoM to switch its style dynamically$--$e.g., Tal's offensive vocation or Petrosian's defensive solidity. When evaluated against Stockfish on unseen standard games, MoM outperforms both dense individual expert networks and popular GPT baselines trained on aggregated data, while ensuring generation variety, control, and interpretability.

Related papers

Game-TARS: Pretrained Foundation Models for Scalable Generalist Multimodal Game Agents [56.25101378553328]
We present Game-TARS, a generalist game agent trained with a unified, scalable action space anchored to human-aligned keyboard-mouse inputs.<n>Game-TARS is pre-trained on over 500B tokens with diverse trajectories and multimodal data.<n> Experiments show that Game-TARS achieves about 2 times the success rate over the previous sota model on open-world Minecraft tasks.
arXiv Detail & Related papers (2025-10-27T17:43:51Z)
Out-of-distribution Tests Reveal Compositionality in Chess Transformers [6.356179251855671]
We train a 270M parameter chess Transformer and test it on out-of-distribution scenarios, designed to reveal failures of systematic generalization.<n>Our analysis shows that Transformers exhibit compositional generalization, as evidenced by strong rule extrapolation.<n>In a more challenging test, we evaluate the models on variants including Chess960 - a variant of chess where starting positions of pieces are randomized.
arXiv Detail & Related papers (2025-10-23T17:51:28Z)
Explore the Reasoning Capability of LLMs in the Chess Testbed [45.12891789312405]
We propose improving the reasoning capability of large language models in chess by integrating annotated strategy and tactic.<n>We finetune the LLaMA-3-8B model and compare it against state-of-the-art commercial language models in the task of selecting better chess moves.
arXiv Detail & Related papers (2024-11-11T01:42:56Z)
Checkmating One, by Using Many: Combining Mixture of Experts with MCTS to Improve in Chess [17.101742121345648]
We introduce M2CTS, a modular framework that adapts strategy dynamically based on game phase.<n>By routing decisions through specialized neural networks trained for each phase, M2CTS improves both computational efficiency and playing strength.<n>In experiments on chess, M2CTS achieves up to +122 Elo over standard single-model baselines.
arXiv Detail & Related papers (2024-01-30T09:55:14Z)
All by Myself: Learning Individualized Competitive Behaviour with a Contrastive Reinforcement Learning optimization [57.615269148301515]
In a competitive game scenario, a set of agents have to learn decisions that maximize their goals and minimize their adversaries' goals at the same time. We propose a novel model composed of three neural layers that learn a representation of a competitive game, learn how to map the strategy of specific opponents, and how to disrupt them. Our experiments demonstrate that our model achieves better performance when playing against offline, online, and competitive-specific models, in particular when playing against the same opponent multiple times.
arXiv Detail & Related papers (2023-10-02T08:11:07Z)
Know your Enemy: Investigating Monte-Carlo Tree Search with Opponent Models in Pommerman [14.668309037894586]
In combination with Reinforcement Learning, Monte-Carlo Tree Search has shown to outperform human grandmasters in games such as Chess, Shogi and Go. We investigate techniques that transform general-sum multiplayer games into single-player and two-player games.
arXiv Detail & Related papers (2023-05-22T16:39:20Z)
Combining Deep Reinforcement Learning and Search with Generative Models for Game-Theoretic Opponent Modeling [30.465929764202155]
We introduce a scalable and generic multiagent training regime for opponent modeling using deep game-theoretic reinforcement learning.<n>We first propose Generative Best Respoonse (GenBR), a best response algorithm based on Monte-Carlo Tree Search (MCTS)<n>We use this new method under the framework of Policy Space Response Oracles (PSRO) to automate the generation of an emphoffline opponent model.
arXiv Detail & Related papers (2023-02-01T23:06:23Z)
Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning [95.78031053296513]
No-press Diplomacy is a complex strategy game involving both cooperation and competition. We introduce a planning algorithm we call DiL-piKL that regularizes a reward-maximizing policy toward a human imitation-learned policy. We show that DiL-piKL can be extended into a self-play reinforcement learning algorithm we call RL-DiL-piKL.
arXiv Detail & Related papers (2022-10-11T14:47:35Z)
Simeon -- Secure Federated Machine Learning Through Iterative Filtering [74.99517537968161]
Federated learning enables a global machine learning model to be trained collaboratively by distributed, mutually non-trusting learning agents. A global model is distributed to clients, who perform training, and submit their newly-trained model to be aggregated into a superior model. A class of Byzantine-tolerant aggregation algorithms has emerged, offering varying degrees of robustness against these attacks. This paper presents Simeon: a novel approach to aggregation that applies a reputation-based iterative filtering technique.
arXiv Detail & Related papers (2021-03-13T12:17:47Z)
L2E: Learning to Exploit Your Opponent [66.66334543946672]
We propose a novel Learning to Exploit framework for implicit opponent modeling. L2E acquires the ability to exploit opponents by a few interactions with different opponents during training. We propose a novel opponent strategy generation algorithm that produces effective opponents for training automatically.
arXiv Detail & Related papers (2021-02-18T14:27:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.