Strategist: Self-improvement of LLM Decision Making via Bi-Level Tree Search
- URL: http://arxiv.org/abs/2408.10635v3
- Date: Tue, 29 Jul 2025 09:42:39 GMT
- Title: Strategist: Self-improvement of LLM Decision Making via Bi-Level Tree Search
- Authors: Jonathan Light, Min Cai, Weiqin Chen, Guanzhi Wang, Xiusi Chen, Wei Cheng, Yisong Yue, Ziniu Hu,
- Abstract summary: Large language models (LLMs) exhibit strong generalization and zero-shot capabilities, but struggle with tasks that require detailed planning and decision-making.<n>We introduce STRATEGIST, a novel approach that integrates the strengths of both methods.<n>We demonstrate the effectiveness of STRATEGIST in learning optimal strategies for competitive, multi-turn games with partial information.
- Score: 32.657454056329875
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Traditional reinforcement learning and planning typically requires vast amounts of data and training to develop effective policies. In contrast, large language models (LLMs) exhibit strong generalization and zero-shot capabilities, but struggle with tasks that require detailed planning and decision-making in complex action spaces. We introduce STRATEGIST, a novel approach that integrates the strengths of both methods. Our approach leverages LLMs to search and update high-level strategies (as text), which are then refined and executed by low-level Monte Carlo Tree Search (MCTS). STRATEGIST is a generalizable framework to optimize the strategy through population-based self-play simulations without the need for any training data. We demonstrate the effectiveness of STRATEGIST in learning optimal strategies for competitive, multi-turn games with partial information, including Game of Pure Strategy (GOPS) and multi-agent, hidden-identity discussion games like The Resistance: Avalon. Our results show that agents equipped with STRATEGIST outperform those trained with traditional RL methods, other LLM-based skill acquisition techniques, pre-existing LLM agents across both game environments and achieves comparable performance against human players.
Related papers
- Omni-Thinker: Scaling Cross-Domain Generalization in LLMs via Multi-Task RL with Hybrid Rewards [50.21528417884747]
We introduce Omni-Thinker, a unified reinforcement learning framework that enhances large language models (LLMs) performance across diverse tasks.<n>Our approach enables consistent optimization across task types and scales RL-based training to subjective domains.<n> Experimental results across four domains reveal that curriculum learning improves performance by 5.2% over joint training and 9.1% over model merging.
arXiv Detail & Related papers (2025-07-20T01:50:16Z) - DipLLM: Fine-Tuning LLM for Strategic Decision-making in Diplomacy [15.472887575322133]
Large Language Models (LLMs) offer a promising alternative to equilibrium search for AI systems.<n>We propose DipLLM, a fine-tuned LLM-based agent that learns equilibrium policies for Diplomacy.<n>Our results demonstrate the potential of fine-tuned LLMs for tackling complex strategic decision-making in multiplayer games.
arXiv Detail & Related papers (2025-06-11T12:25:32Z) - Strategy-Augmented Planning for Large Language Models via Opponent Exploitation [11.840105106884543]
We introduce a two-stage Strategy-Augmented Planning (SAP) framework that significantly enhances the opponent exploitation capabilities of LLM-based agents.<n>In the offline stage, we construct an explicit strategy space and subsequently collect strategy-outcome pair data for training the Strategy Evaluation Network (SEN)<n>During the online phase, SAP dynamically recognizes the opponent's strategies and greedily exploits them by searching best response strategy on the well-trained SEN.
arXiv Detail & Related papers (2025-05-13T11:41:10Z) - LLM-Gomoku: A Large Language Model-Based System for Strategic Gomoku with Self-Play and Reinforcement Learning [4.22453895366234]
This study aims to develop a Gomoku AI system based on large language models (LLMs)
The system is de-signed to understand and apply Gomoku strat-egies and logic to make rational decisions.
After extensive self-play training, the model's Gomoku-playing capabilities have been notably enhanced.
arXiv Detail & Related papers (2025-03-27T16:52:25Z) - SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks [110.20297293596005]
Large language model (LLM) agents need to perform multi-turn interactions in real-world tasks.<n>Existing multi-turn RL algorithms for optimizing LLM agents fail to perform effective credit assignment over multiple turns while leveraging the generalization capabilities of LLMs.<n>We propose a novel RL algorithm, SWEET-RL, that uses a carefully designed optimization objective to train a critic model with access to additional training-time information.<n>Our experiments demonstrate that SWEET-RL achieves a 6% absolute improvement in success and win rates on ColBench compared to other state-of-the-art multi-turn RL algorithms.
arXiv Detail & Related papers (2025-03-19T17:55:08Z) - ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning [53.817538122688944]
We introduce Reinforced Meta-thinking Agents (ReMA) to elicit meta-thinking behaviors from Reasoning of Large Language Models (LLMs)<n>ReMA decouples the reasoning process into two hierarchical agents: a high-level meta-thinking agent responsible for generating strategic oversight and plans, and a low-level reasoning agent for detailed executions.<n> Empirical results from single-turn experiments demonstrate that ReMA outperforms single-agent RL baselines on complex reasoning tasks.
arXiv Detail & Related papers (2025-03-12T16:05:31Z) - Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search [57.28671084993782]
Large language models (LLMs) have demonstrated remarkable reasoning capabilities across diverse domains.
Recent studies have shown that increasing test-time computation enhances LLMs' reasoning capabilities.
We propose a two-stage training paradigm: 1) a small-scale format tuning stage to internalize the COAT reasoning format and 2) a large-scale self-improvement stage leveraging reinforcement learning.
arXiv Detail & Related papers (2025-02-04T17:26:58Z) - LLM-PySC2: Starcraft II learning environment for Large Language Models [16.918044347226104]
This paper introduces a new environment that serves to develop Large Language Models (LLMs) based decision-making methodologies.
This environment is the first to offer the complete StarCraft II action space, multi-modal observation interfaces, and a structured game knowledge database.
arXiv Detail & Related papers (2024-11-08T06:04:22Z) - From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.
We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning [70.16816087320585]
Monte Carlo Tree Search (MCTS) has emerged as a powerful technique for enhancing the reasoning capabilities of LLMs.
Existing distillation methods underutilize the rich trajectory information generated by MCTS.
We propose AlphaLLM-CPL, a novel pairwise training framework that enables LLMs to self-improve through MCTS behavior distillation.
arXiv Detail & Related papers (2024-10-09T03:20:02Z) - Learning Strategy Representation for Imitation Learning in Multi-Agent Games [15.209555810145549]
We introduce the Strategy Representation for Learning (STRIL) framework, which effectively learns strategy representations in multi-agent games.
STRIL is a plug-in method that can be integrated into existing IL algorithms.
We demonstrate the effectiveness of STRIL across competitive multi-agent scenarios, including Two-player Pong, Limit Texas Hold'em, and Connect Four.
arXiv Detail & Related papers (2024-09-28T14:30:17Z) - Using Advanced LLMs to Enhance Smaller LLMs: An Interpretable Knowledge Distillation Approach [6.154304269581415]
Advanced Large language models (LLMs) provide superior performance in complex human-like interactions.
LLMs are costly, or too large for edge devices such as smartphones and harder to self-host, leading to security and privacy concerns.
This paper introduces a novel interpretable knowledge distillation approach to enhance the performance of smaller, more economical LLMs.
arXiv Detail & Related papers (2024-08-13T23:59:36Z) - Meta Reasoning for Large Language Models [58.87183757029041]
We introduce Meta-Reasoning Prompting (MRP), a novel and efficient system prompting method for large language models (LLMs)
MRP guides LLMs to dynamically select and apply different reasoning methods based on the specific requirements of each task.
We evaluate the effectiveness of MRP through comprehensive benchmarks.
arXiv Detail & Related papers (2024-06-17T16:14:11Z) - GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations [87.99872683336395]
Large Language Models (LLMs) are integrated into critical real-world applications.
This paper evaluates LLMs' reasoning abilities in competitive environments.
We first propose GTBench, a language-driven environment composing 10 widely recognized tasks.
arXiv Detail & Related papers (2024-02-19T18:23:36Z) - Large Language Models as Agents in Two-Player Games [12.303405412105187]
This paper delineates the parallels between the training methods of large language models (LLMs) and the strategies employed for the development of agents in two-player games.
We propose a re-conceptualization of LLM learning processes in terms of agent learning in language-based games.
arXiv Detail & Related papers (2024-02-12T21:44:32Z) - K-Level Reasoning: Establishing Higher Order Beliefs in Large Language Models for Strategic Reasoning [76.3114831562989]
It requires Large Language Model (LLM) agents to adapt their strategies dynamically in multi-agent environments.
We propose a novel framework: "K-Level Reasoning with Large Language Models (K-R)"
arXiv Detail & Related papers (2024-02-02T16:07:05Z) - ALYMPICS: LLM Agents Meet Game Theory -- Exploring Strategic
Decision-Making with AI Agents [77.34720446306419]
Alympics is a systematic simulation framework utilizing Large Language Model (LLM) agents for game theory research.
Alympics creates a versatile platform for studying complex game theory problems.
arXiv Detail & Related papers (2023-11-06T16:03:46Z) - Leveraging Word Guessing Games to Assess the Intelligence of Large
Language Models [105.39236338147715]
The paper is inspired by the popular language game Who is Spy''
We develop DEEP to evaluate LLMs' expression and disguising abilities.
We then introduce SpyGame, an interactive multi-agent framework.
arXiv Detail & Related papers (2023-10-31T14:37:42Z) - Strategic Reasoning with Language Models [35.63300060111918]
Strategic reasoning enables agents to cooperate, communicate, and compete with other agents in diverse situations.
Existing approaches to solving strategic games rely on extensive training, yielding strategies that do not generalize to new scenarios or games without retraining.
This paper introduces an approach that uses pretrained Large Language Models with few-shot chain-of-thought examples to enable strategic reasoning for AI agents.
arXiv Detail & Related papers (2023-05-30T16:09:19Z) - Learning Meta Representations for Agents in Multi-Agent Reinforcement
Learning [12.170248966278281]
In multi-agent reinforcement learning, behaviors that agents learn in a single Markov Game (MG) are typically confined to the given agent number.
In this work, our focus is on creating agents that can generalize across population-varying MGs.
Instead of learning a unimodal policy, each agent learns a policy set comprising effective strategies across a variety of games.
arXiv Detail & Related papers (2021-08-30T04:30:53Z) - Rethinking Supervised Learning and Reinforcement Learning in
Task-Oriented Dialogue Systems [58.724629408229205]
We demonstrate how traditional supervised learning and a simulator-free adversarial learning method can be used to achieve performance comparable to state-of-the-art RL-based methods.
Our main goal is not to beat reinforcement learning with supervised learning, but to demonstrate the value of rethinking the role of reinforcement learning and supervised learning in optimizing task-oriented dialogue systems.
arXiv Detail & Related papers (2020-09-21T12:04:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.