Agents of Change: Self-Evolving LLM Agents for Strategic Planning
- URL: http://arxiv.org/abs/2506.04651v2
- Date: Mon, 13 Oct 2025 08:57:19 GMT
- Title: Agents of Change: Self-Evolving LLM Agents for Strategic Planning
- Authors: Nikolas Belle, Dakota Barnes, Alfonso Amayuelas, Ivan Bercovich, Xin Eric Wang, William Wang,
- Abstract summary: HexMachina is a continual learning multi-agent system that separates environment discovery from strategy improvement.<n>In controlled Catanatron experiments, HexMachina learns from scratch and evolves players that outperform the strongest human-crafted baseline.
- Score: 28.172006841163938
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We address the long-horizon gap in large language model (LLM) agents by enabling them to sustain coherent strategies in adversarial, stochastic environments. Settlers of Catan provides a challenging benchmark: success depends on balancing short- and long-term goals amid randomness, trading, expansion, and blocking. Prompt-centric LLM agents (e.g., ReAct, Reflexion) must re-interpret large, evolving game states each turn, quickly saturating context windows and losing strategic consistency. We propose HexMachina, a continual learning multi-agent system that separates environment discovery (inducing an adapter layer without documentation) from strategy improvement (evolving a compiled player through code refinement and simulation). This design preserves executable artifacts, allowing the LLM to focus on high-level strategy rather than per-turn reasoning. In controlled Catanatron experiments, HexMachina learns from scratch and evolves players that outperform the strongest human-crafted baseline (AlphaBeta), achieving a 54% win rate and surpassing prompt-driven and no-discovery baselines. Ablations confirm that isolating pure strategy learning improves performance. Overall, artifact-centric continual learning transforms LLMs from brittle stepwise deciders into stable strategy designers, advancing long-horizon autonomy.
Related papers
- MAGE: Meta-Reinforcement Learning for Language Agents toward Strategic Exploration and Exploitation [11.222602737031101]
We propose MAGE, a meta-RL framework that empowers LLM agents for strategic exploration and exploitation.<n>MAGE utilizes a multi-episode training regime where interaction histories and reflections are integrated into the context window.<n>Experiment results show that MAGE outperforms existing baselines in both exploration and exploitation tasks.
arXiv Detail & Related papers (2026-03-04T03:14:37Z) - Expanding LLM Agent Boundaries with Strategy-Guided Exploration [51.98616048282804]
Reinforcement learning (RL) has demonstrated notable success in post-training large language models (LLMs) as agents for tasks such as computer use, tool calling, and coding.<n>We propose Strategy-Guided Exploration (SGE) to shift exploration from low-level actions to higher-level language strategies.
arXiv Detail & Related papers (2026-03-02T16:28:39Z) - rSIM: Incentivizing Reasoning Capabilities of LLMs via Reinforced Strategy Injection [49.74493901036598]
Large language models (LLMs) are post-trained through reinforcement learning (RL) to evolve into Reasoning Language Models (RLMs)<n>This paper proposes a novel reinforced strategy injection mechanism (rSIM) that enables any LLM to become an RLM by employing a small planner.<n> Experimental results show that rSIM enables Qwen2.5-0.5B to become an RLM and significantly outperform Qwen2.5-14B.
arXiv Detail & Related papers (2025-12-09T06:55:39Z) - AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning [129.44038804430542]
We introduce AgentGym-RL, a new framework to train LLM agents for multi-turn interactive decision-making through RL.<n>We propose ScalingInter-RL, a training approach designed for exploration-exploitation balance and stable RL optimization.<n>Our agents match or surpass commercial models on 27 tasks across diverse environments.
arXiv Detail & Related papers (2025-09-10T16:46:11Z) - PillagerBench: Benchmarking LLM-Based Agents in Competitive Minecraft Team Environments [48.892997022500765]
We introduce PillagerBench, a framework for evaluating multi-agent systems in real-time competitive team-vs-team scenarios in Minecraft.<n>We also propose TactiCrafter, an LLM-based multi-agent system that facilitates teamwork through human-readable tactics.<n>Our evaluation demonstrates that TactiCrafter outperforms baseline approaches and showcases adaptive learning through self-play.
arXiv Detail & Related papers (2025-09-07T22:51:12Z) - Society of Mind Meets Real-Time Strategy: A Hierarchical Multi-Agent Framework for Strategic Reasoning [16.35236123729838]
We propose a hierarchical multi-agent framework that employs specialized imitation learning agents under a meta-controller called Strategic Planner (SP)<n>By expert demonstrations, each specialized agent learns a distinctive strategy, such as aerial support or defensive maneuvers, and produces coherent, structured multistep action sequences.<n>The SP then orchestrates these proposals into a single, environmentally adaptive plan that ensures local decisions align with long-term strategies.
arXiv Detail & Related papers (2025-08-08T05:57:12Z) - Training LLM-Based Agents with Synthetic Self-Reflected Trajectories and Partial Masking [61.61356842567952]
We propose STeP, a novel method for improving LLM-based agent training.<n>We synthesize self-reflected trajectories that include reflections and corrections of error steps.<n>Experiments demonstrate that our method improves agent performance across three representative tasks.
arXiv Detail & Related papers (2025-05-26T14:11:12Z) - Divide and Conquer: Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement Learning [32.260964481673085]
Large language models (LLMs) struggle with long-horizon decision-making tasks due to deficient exploration and long-term credit assignment.<n>We propose an innovative framework that introduces a parameter-efficient and generally applicable hierarchy to LLM policies.<n>We develop a scheme where the low-level controller is supervised with abstract, step-by-step plans that are learned and instructed by the high-level policy.
arXiv Detail & Related papers (2025-05-26T09:43:40Z) - The Influence of Human-inspired Agentic Sophistication in LLM-driven Strategic Reasoners [3.5083201638203154]
We evaluate the role of agentic sophistication in shaping artificial reasoners' performance.<n>We benchmarked three agent designs: a simple game-theoretic model, an unstructured LLM-as-agent model, and an LLM integrated into a traditional agentic framework.<n>Our analysis, covering over 2000 reasoning samples across 25 agent configurations, shows that human-inspired cognitive structures can enhance LLM agents' alignment with human strategic behaviour.
arXiv Detail & Related papers (2025-05-14T13:51:24Z) - FAIRGAME: a Framework for AI Agents Bias Recognition using Game Theory [51.96049148869987]
We present FAIRGAME, a Framework for AI Agents Bias Recognition using Game Theory.<n>We describe its implementation and usage, and we employ it to uncover biased outcomes in popular games among AI agents.<n>Overall, FAIRGAME allows users to reliably and easily simulate their desired games and scenarios.
arXiv Detail & Related papers (2025-04-19T15:29:04Z) - ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning [53.817538122688944]
We introduce Reinforced Meta-thinking Agents (ReMA) to elicit meta-thinking behaviors from Reasoning of Large Language Models (LLMs)<n>ReMA decouples the reasoning process into two hierarchical agents: a high-level meta-thinking agent responsible for generating strategic oversight and plans, and a low-level reasoning agent for detailed executions.<n> Empirical results from single-turn experiments demonstrate that ReMA outperforms single-agent RL baselines on complex reasoning tasks.
arXiv Detail & Related papers (2025-03-12T16:05:31Z) - Reinforcing Competitive Multi-Agents for Playing 'So Long Sucker' [0.12234742322758417]
This paper investigates the strategy game So Long Sucker (SLS) as a novel benchmark for multi-agent reinforcement learning (MARL)<n>We introduce the first publicly available computational framework for SLS, complete with a graphical user interface and benchmarking support for reinforcement learning algorithms.
arXiv Detail & Related papers (2024-11-17T12:38:13Z) - SAC-GLAM: Improving Online RL for LLM agents with Soft Actor-Critic and Hindsight Relabeling [29.29604779151457]
This paper presents and studies an adaptation of Soft Actor-Critic and hindsight relabeling to LLM agents.
Our method paves the path towards autotelic LLM agents that learn online but can also outperform on-policy methods in more classic multi-goal RL environments.
arXiv Detail & Related papers (2024-10-16T11:59:27Z) - Strategist: Self-improvement of LLM Decision Making via Bi-Level Tree Search [32.657454056329875]
Large language models (LLMs) exhibit strong generalization and zero-shot capabilities, but struggle with tasks that require detailed planning and decision-making.<n>We introduce STRATEGIST, a novel approach that integrates the strengths of both methods.<n>We demonstrate the effectiveness of STRATEGIST in learning optimal strategies for competitive, multi-turn games with partial information.
arXiv Detail & Related papers (2024-08-20T08:22:04Z) - Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization [53.510942601223626]
Large Language Models (LLMs) exhibit robust problem-solving capabilities for diverse tasks.
These task solvers necessitate manually crafted prompts to inform task rules and regulate behaviors.
We propose Agent-Pro: an LLM-based Agent with Policy-level Reflection and Optimization.
arXiv Detail & Related papers (2024-02-27T15:09:20Z) - AgentLite: A Lightweight Library for Building and Advancing
Task-Oriented LLM Agent System [91.41155892086252]
We open-source a new AI agent library, AgentLite, which simplifies research investigation into LLM agents.
AgentLite is a task-oriented framework designed to enhance the ability of agents to break down tasks.
We introduce multiple practical applications developed with AgentLite to demonstrate its convenience and flexibility.
arXiv Detail & Related papers (2024-02-23T06:25:20Z) - Language Agents with Reinforcement Learning for Strategic Play in the Werewolf Game [37.69298376616128]
We develop strategic language agents that generate flexible language actions and possess strong decision-making abilities.<n>To mitigate the intrinsic bias in language actions, our agents use an LLM to perform deductive reasoning and generate a diverse set of action candidates.<n>Experiments show that our agents overcome the intrinsic bias and outperform existing LLM-based agents in the Werewolf game.
arXiv Detail & Related papers (2023-10-29T09:02:57Z) - The Rise and Potential of Large Language Model Based Agents: A Survey [91.71061158000953]
Large language models (LLMs) are regarded as potential sparks for Artificial General Intelligence (AGI)
We start by tracing the concept of agents from its philosophical origins to its development in AI, and explain why LLMs are suitable foundations for agents.
We explore the extensive applications of LLM-based agents in three aspects: single-agent scenarios, multi-agent scenarios, and human-agent cooperation.
arXiv Detail & Related papers (2023-09-14T17:12:03Z) - AgentBench: Evaluating LLMs as Agents [88.45506148281379]
Large Language Models (LLMs) are becoming increasingly smart and autonomous, targeting real-world pragmatic missions beyond traditional NLP tasks.
We present AgentBench, a benchmark that currently consists of 8 distinct environments to assess LLM-as-Agent's reasoning and decision-making abilities.
arXiv Detail & Related papers (2023-08-07T16:08:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.