Related papers: Agents of Change: Self-Evolving LLM Agents for Strategic Planning

Agents of Change: Self-Evolving LLM Agents for Strategic Planning

URL: http://arxiv.org/abs/2506.04651v1
Date: Thu, 05 Jun 2025 05:45:24 GMT
Title: Agents of Change: Self-Evolving LLM Agents for Strategic Planning
Authors: Nikolas Belle, Dakota Barnes, Alfonso Amayuelas, Ivan Bercovich, Xin Eric Wang, William Wang,
Abstract summary: We benchmark a progression of LLM-based agents, from a simple game-playing agent to systems capable of autonomously rewriting their own prompts and their player agent's code.<n>Our results show that self-evolving agents, particularly when powered by models like Claude 3.7 and GPT-4o, outperform static baselines by autonomously adopting their strategies.
Score: 17.67637003848376
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in LLMs have enabled their use as autonomous agents across a range of tasks, yet they continue to struggle with formulating and adhering to coherent long-term strategies. In this paper, we investigate whether LLM agents can self-improve when placed in environments that explicitly challenge their strategic planning abilities. Using the board game Settlers of Catan, accessed through the open-source Catanatron framework, we benchmark a progression of LLM-based agents, from a simple game-playing agent to systems capable of autonomously rewriting their own prompts and their player agent's code. We introduce a multi-agent architecture in which specialized roles (Analyzer, Researcher, Coder, and Player) collaborate to iteratively analyze gameplay, research new strategies, and modify the agent's logic or prompt. By comparing manually crafted agents to those evolved entirely by LLMs, we evaluate how effectively these systems can diagnose failure and adapt over time. Our results show that self-evolving agents, particularly when powered by models like Claude 3.7 and GPT-4o, outperform static baselines by autonomously adopting their strategies, passing along sample behavior to game-playing agents, and demonstrating adaptive reasoning over multiple iterations.

Related papers

Training LLM-Based Agents with Synthetic Self-Reflected Trajectories and Partial Masking [61.61356842567952]
We propose STeP, a novel method for improving LLM-based agent training.<n>We synthesize self-reflected trajectories that include reflections and corrections of error steps.<n>Experiments demonstrate that our method improves agent performance across three representative tasks.
arXiv Detail & Related papers (2025-05-26T14:11:12Z)
The Influence of Human-inspired Agentic Sophistication in LLM-driven Strategic Reasoners [3.5083201638203154]
We evaluate the role of agentic sophistication in shaping artificial reasoners' performance.<n>We benchmarked three agent designs: a simple game-theoretic model, an unstructured LLM-as-agent model, and an LLM integrated into a traditional agentic framework.<n>Our analysis, covering over 2000 reasoning samples across 25 agent configurations, shows that human-inspired cognitive structures can enhance LLM agents' alignment with human strategic behaviour.
arXiv Detail & Related papers (2025-05-14T13:51:24Z)
FAIRGAME: a Framework for AI Agents Bias Recognition using Game Theory [51.96049148869987]
We present FAIRGAME, a Framework for AI Agents Bias Recognition using Game Theory.<n>We describe its implementation and usage, and we employ it to uncover biased outcomes in popular games among AI agents.<n>Overall, FAIRGAME allows users to reliably and easily simulate their desired games and scenarios.
arXiv Detail & Related papers (2025-04-19T15:29:04Z)
SAC-GLAM: Improving Online RL for LLM agents with Soft Actor-Critic and Hindsight Relabeling [29.29604779151457]
This paper presents and studies an adaptation of Soft Actor-Critic and hindsight relabeling to LLM agents. Our method paves the path towards autotelic LLM agents that learn online but can also outperform on-policy methods in more classic multi-goal RL environments.
arXiv Detail & Related papers (2024-10-16T11:59:27Z)
Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization [53.510942601223626]
Large Language Models (LLMs) exhibit robust problem-solving capabilities for diverse tasks. These task solvers necessitate manually crafted prompts to inform task rules and regulate behaviors. We propose Agent-Pro: an LLM-based Agent with Policy-level Reflection and Optimization.
arXiv Detail & Related papers (2024-02-27T15:09:20Z)
AgentLite: A Lightweight Library for Building and Advancing Task-Oriented LLM Agent System [91.41155892086252]
We open-source a new AI agent library, AgentLite, which simplifies research investigation into LLM agents. AgentLite is a task-oriented framework designed to enhance the ability of agents to break down tasks. We introduce multiple practical applications developed with AgentLite to demonstrate its convenience and flexibility.
arXiv Detail & Related papers (2024-02-23T06:25:20Z)
Language Agents with Reinforcement Learning for Strategic Play in the Werewolf Game [37.69298376616128]
We develop strategic language agents that generate flexible language actions and possess strong decision-making abilities.<n>To mitigate the intrinsic bias in language actions, our agents use an LLM to perform deductive reasoning and generate a diverse set of action candidates.<n>Experiments show that our agents overcome the intrinsic bias and outperform existing LLM-based agents in the Werewolf game.
arXiv Detail & Related papers (2023-10-29T09:02:57Z)
The Rise and Potential of Large Language Model Based Agents: A Survey [91.71061158000953]
Large language models (LLMs) are regarded as potential sparks for Artificial General Intelligence (AGI) We start by tracing the concept of agents from its philosophical origins to its development in AI, and explain why LLMs are suitable foundations for agents. We explore the extensive applications of LLM-based agents in three aspects: single-agent scenarios, multi-agent scenarios, and human-agent cooperation.
arXiv Detail & Related papers (2023-09-14T17:12:03Z)
AgentBench: Evaluating LLMs as Agents [88.45506148281379]
Large Language Models (LLMs) are becoming increasingly smart and autonomous, targeting real-world pragmatic missions beyond traditional NLP tasks. We present AgentBench, a benchmark that currently consists of 8 distinct environments to assess LLM-as-Agent's reasoning and decision-making abilities.
arXiv Detail & Related papers (2023-08-07T16:08:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.