Mastering Board Games by External and Internal Planning with Language Models
- URL: http://arxiv.org/abs/2412.12119v2
- Date: Tue, 29 Apr 2025 18:06:45 GMT
- Title: Mastering Board Games by External and Internal Planning with Language Models
- Authors: John Schultz, Jakub Adamek, Matej Jusup, Marc Lanctot, Michael Kaisers, Sarah Perrin, Daniel Hennes, Jeremy Shar, Cannada Lewis, Anian Ruoss, Tom Zahavy, Petar Veličković, Laurel Prince, Satinder Singh, Eric Malmi, Nenad Tomašev,
- Abstract summary: We show that search-based planning can yield significant improvements in Large Language Models game-playing strength.<n>We introduce, compare and contrast two major approaches: in external search, the model guides Monte Carlo Tree Search rollouts and evaluations without calls to an external game engine, and in internal search, the model is trained to generate in-context a linearized tree of search and a resulting final choice.<n>Our proposed approach, combining search with domain knowledge, is not specific to board games, hinting at more general future applications.
- Score: 30.782334791241556
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Advancing planning and reasoning capabilities of Large Language Models (LLMs) is one of the key prerequisites towards unlocking their potential for performing reliably in complex and impactful domains. In this paper, we aim to demonstrate this across board games (Chess, Fischer Random / Chess960, Connect Four, and Hex), and we show that search-based planning can yield significant improvements in LLM game-playing strength. We introduce, compare and contrast two major approaches: In external search, the model guides Monte Carlo Tree Search (MCTS) rollouts and evaluations without calls to an external game engine, and in internal search, the model is trained to generate in-context a linearized tree of search and a resulting final choice. Both build on a language model pre-trained on relevant domain knowledge, reliably capturing the transition and value functions in the respective environments, with minimal hallucinations. We evaluate our LLM search implementations against game-specific state-of-the-art engines, showcasing substantial improvements in strength over the base model, and reaching Grandmaster-level performance in chess while operating closer to the human search budget. Our proposed approach, combining search with domain knowledge, is not specific to board games, hinting at more general future applications.
Related papers
- Monte Carlo Planning with Large Language Model for Text-Based Game Agents [27.385517721352368]
We introduce the Monte Carlo planning with Dynamic Memory-guided Large language model (MC-DML) algorithm.
MC-DML leverages the language understanding and reasoning capabilities of Large Language Models (LLMs) alongside the exploratory advantages of tree search algorithms.
Our results demonstrate that the MC-DML algorithm significantly enhances performance across various games at the initial planning phase.
arXiv Detail & Related papers (2025-04-23T16:23:15Z) - TALES: Text Adventure Learning Environment Suite [28.997169350434795]
Reasoning is an essential skill to enable Large Language Models (LLMs) to interact with the world.
We introduce TALES, a diverse collection of synthetic and human-written text-adventure games designed to challenge and evaluate diverse reasoning capabilities.
Despite an impressive showing on synthetic games, even the top LLM-driven agents fail to achieve 15% on games designed for human enjoyment.
arXiv Detail & Related papers (2025-04-19T01:02:42Z) - V-MAGE: A Game Evaluation Framework for Assessing Visual-Centric Capabilities in Multimodal Large Language Models [84.27290155010533]
V-MAGE is a game-based evaluation framework designed to assess visual reasoning capabilities of MLLMs.
We use V-MAGE to evaluate leading MLLMs, revealing significant challenges in their visual perception and reasoning.
arXiv Detail & Related papers (2025-04-08T15:43:01Z) - Exploring Large Language Models for Word Games:Who is the Spy? [0.0]
This study explores how large language models (LLMs) can be effectively involved in word games.
We introduce a Chain-of-Thought (CoT)-based scheduling framework to enable LLMs to achieve excellent performance in tasks such as inferring role words and disguising their identities.
arXiv Detail & Related papers (2025-03-19T14:13:02Z) - Enhancing LLM Reasoning with Reward-guided Tree Search [95.06503095273395]
o1-like reasoning approach is challenging, and researchers have been making various attempts to advance this open area of research.
We present a preliminary exploration into enhancing the reasoning abilities of LLMs through reward-guided tree search algorithms.
arXiv Detail & Related papers (2024-11-18T16:15:17Z) - Evaluating Creativity and Deception in Large Language Models: A Simulation Framework for Multi-Agent Balderdash [6.65572931991284]
Large Language Models (LLMs) have shown impressive capabilities in complex tasks and interactive environments.
This paper introduces a simulation framework utilizing the game Balderdash to evaluate both the creativity and logical reasoning of LLMs.
arXiv Detail & Related papers (2024-11-15T18:42:48Z) - Explore the Reasoning Capability of LLMs in the Chess Testbed [45.12891789312405]
We propose improving the reasoning capability of large language models in chess by integrating annotated strategy and tactic.
We finetune the LLaMA-3-8B model and compare it against state-of-the-art commercial language models in the task of selecting better chess moves.
arXiv Detail & Related papers (2024-11-11T01:42:56Z) - clembench-2024: A Challenging, Dynamic, Complementary, Multilingual Benchmark and Underlying Flexible Framework for LLMs as Multi-Action Agents [19.989503513817095]
Large Language Models can be prompted to "self-play" conversational games that probe certain capabilities.
We take one of the proposed frameworks for setting up such game-play environments, and test its usefulness as an evaluation instrument.
arXiv Detail & Related papers (2024-05-31T14:43:31Z) - Emergent World Models and Latent Variable Estimation in Chess-Playing Language Models [0.0]
We train a GPT model on Othello games and find that the model learned an internal representation of the board state.
We extend this work into the more complex domain of chess, training on real games and investigating our model's internal representations.
Unlike Li et al.'s prior synthetic dataset approach, our analysis finds that the model also learns to estimate latent variables like player skill to better predict the next character.
arXiv Detail & Related papers (2024-03-21T18:53:23Z) - LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language
Models [56.25156596019168]
This paper introduces the LMRL-Gym benchmark for evaluating multi-turn RL for large language models (LLMs)
Our benchmark consists of 8 different language tasks, which require multiple rounds of language interaction and cover a range of tasks in open-ended dialogue and text games.
arXiv Detail & Related papers (2023-11-30T03:59:31Z) - Large Search Model: Redefining Search Stack in the Era of LLMs [63.503320030117145]
We introduce a novel conceptual framework called large search model, which redefines the conventional search stack by unifying search tasks with one large language model (LLM)
All tasks are formulated as autoregressive text generation problems, allowing for the customization of tasks through the use of natural language prompts.
This proposed framework capitalizes on the strong language understanding and reasoning capabilities of LLMs, offering the potential to enhance search result quality while simultaneously simplifying the existing cumbersome search stack.
arXiv Detail & Related papers (2023-10-23T05:52:09Z) - SPRING: Studying the Paper and Reasoning to Play Games [102.5587155284795]
We propose a novel approach, SPRING, to read the game's original academic paper and use the knowledge learned to reason and play the game through a large language model (LLM)
In experiments, we study the quality of in-context "reasoning" induced by different forms of prompts under the setting of the Crafter open-world environment.
Our experiments suggest that LLMs, when prompted with consistent chain-of-thought, have great potential in completing sophisticated high-level trajectories.
arXiv Detail & Related papers (2023-05-24T18:14:35Z) - Promptable Game Models: Text-Guided Game Simulation via Masked Diffusion
Models [68.85478477006178]
We present a Promptable Game Model (PGM) for neural video game simulators.
It allows a user to play the game by prompting it with high- and low-level action sequences.
Most captivatingly, our PGM unlocks the director's mode, where the game is played by specifying goals for the agents in the form of a prompt.
Our method significantly outperforms existing neural video game simulators in terms of rendering quality and unlocks applications beyond the capabilities of the current state of the art.
arXiv Detail & Related papers (2023-03-23T17:43:17Z) - PaLM-E: An Embodied Multimodal Language Model [101.29116156731762]
We propose embodied language models to incorporate real-world continuous sensor modalities into language models.
We train these encodings end-to-end, in conjunction with a pre-trained large language model, for multiple embodied tasks.
Our largest model, PaLM-E-562B with 562B parameters, is a visual-language generalist with state-of-the-art performance on OK-VQA.
arXiv Detail & Related papers (2023-03-06T18:58:06Z) - Improving Chess Commentaries by Combining Language Models with Symbolic
Reasoning Engines [31.87260568733666]
We show how to combine symbolic reasoning engines with controllable language models to generate chess commentaries.
We conduct experiments to demonstrate that our approach generates commentaries preferred by human judges over previous baselines.
arXiv Detail & Related papers (2022-12-15T23:38:31Z) - Learning Chess Blindfolded: Evaluating Language Models on State Tracking [69.3794549747725]
We consider the task of language modeling for the game of chess.
Unlike natural language, chess notations describe a simple, constrained, and deterministic domain.
We find that transformer language models can learn to track pieces and predict legal moves with high accuracy when trained solely on move sequences.
arXiv Detail & Related papers (2021-02-26T01:16:23Z) - Deep Reinforcement Learning with Stacked Hierarchical Attention for
Text-based Games [64.11746320061965]
We study reinforcement learning for text-based games, which are interactive simulations in the context of natural language.
We aim to conduct explicit reasoning with knowledge graphs for decision making, so that the actions of an agent are generated and supported by an interpretable inference procedure.
We extensively evaluate our method on a number of man-made benchmark games, and the experimental results demonstrate that our method performs better than existing text-based agents.
arXiv Detail & Related papers (2020-10-22T12:40:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.