Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement
- URL: http://arxiv.org/abs/2406.11176v2
- Date: Tue, 24 Sep 2024 10:01:31 GMT
- Title: Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement
- Authors: Weimin Xiong, Yifan Song, Xiutian Zhao, Wenhao Wu, Xun Wang, Ke Wang, Cheng Li, Wei Peng, Sujian Li,
- Abstract summary: Iterative step-level Process Refinement (IPR) framework provides detailed step-by-step guidance to enhance agent training.
Our experiments on three complex agent tasks demonstrate that our framework outperforms a variety of strong baselines.
- Score: 50.481380478458945
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language model agents have exhibited exceptional performance across a range of complex interactive tasks. Recent approaches have utilized tuning with expert trajectories to enhance agent performance, yet they primarily concentrate on outcome rewards, which may lead to errors or suboptimal actions due to the absence of process supervision signals. In this paper, we introduce the Iterative step-level Process Refinement (IPR) framework, which provides detailed step-by-step guidance to enhance agent training. Specifically, we adopt the Monte Carlo method to estimate step-level rewards. During each iteration, the agent explores along the expert trajectory and generates new actions. These actions are then evaluated against the corresponding step of expert trajectory using step-level rewards. Such comparison helps identify discrepancies, yielding contrastive action pairs that serve as training data for the agent. Our experiments on three complex agent tasks demonstrate that our framework outperforms a variety of strong baselines. Moreover, our analytical findings highlight the effectiveness of IPR in augmenting action efficiency and its applicability to diverse models.
Related papers
- From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.
We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - Textualized Agent-Style Reasoning for Complex Tasks by Multiple Round LLM Generation [49.27250832754313]
We present AgentCOT, a llm-based autonomous agent framework.
At each step, AgentCOT selects an action and executes it to yield an intermediate result with supporting evidence.
We introduce two new strategies to enhance the performance of AgentCOT.
arXiv Detail & Related papers (2024-09-19T02:20:06Z) - Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents [49.85633804913796]
We present an exploration-based trajectory optimization approach, referred to as ETO.
This learning method is designed to enhance the performance of open LLM agents.
Our experiments on three complex tasks demonstrate that ETO consistently surpasses baseline performance by a large margin.
arXiv Detail & Related papers (2024-03-04T21:50:29Z) - AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents [76.95062553043607]
evaluating large language models (LLMs) is essential for understanding their capabilities and facilitating their integration into practical applications.
We introduce AgentBoard, a pioneering comprehensive benchmark and accompanied open-source evaluation framework tailored to analytical evaluation of LLM agents.
arXiv Detail & Related papers (2024-01-24T01:51:00Z) - Let's reward step by step: Step-Level reward model as the Navigators for
Reasoning [64.27898739929734]
Process-Supervised Reward Model (PRM) furnishes LLMs with step-by-step feedback during the training phase.
We propose a greedy search algorithm that employs the step-level feedback from PRM to optimize the reasoning pathways explored by LLMs.
To explore the versatility of our approach, we develop a novel method to automatically generate step-level reward dataset for coding tasks and observed similar improved performance in the code generation tasks.
arXiv Detail & Related papers (2023-10-16T05:21:50Z) - Credit-cognisant reinforcement learning for multi-agent cooperation [0.0]
We introduce the concept of credit-cognisant rewards, which allows an agent to perceive the effect its actions had on the environment as well as on its co-agents.
We show that by manipulating these experiences and constructing the reward contained within them to include the rewards received by all the agents within the same action sequence, we are able to improve significantly on the performance of independent deep Q-learning.
arXiv Detail & Related papers (2022-11-18T09:00:25Z) - Beyond Rewards: a Hierarchical Perspective on Offline Multiagent
Behavioral Analysis [14.656957226255628]
We introduce a model-agnostic method for discovery of behavior clusters in multiagent domains.
Our framework makes no assumption about agents' underlying learning algorithms, does not require access to their latent states or models, and can be trained using entirely offline observational data.
arXiv Detail & Related papers (2022-06-17T23:07:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.