360$^\circ$REA: Towards A Reusable Experience Accumulation with 360° Assessment for Multi-Agent System
- URL: http://arxiv.org/abs/2404.05569v2
- Date: Wed, 26 Jun 2024 11:42:10 GMT
- Title: 360$^\circ$REA: Towards A Reusable Experience Accumulation with 360° Assessment for Multi-Agent System
- Authors: Shen Gao, Hao Li, Chengrui Huang, Quan Tu, Zhiliang Tian, Minlie Huang, Shuo Shang,
- Abstract summary: We argue that a comprehensive evaluation and accumulating experience from evaluation feedback is an effective approach to improving system performance.
We propose Reusable Experience Accumulation with 360$circ$ Assessment (360$circ$REA), a hierarchical multi-agent framework inspired by corporate organizational practices.
- Score: 71.96888731208838
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language model agents have demonstrated remarkable advancements across various complex tasks. Recent works focus on optimizing the agent team or employing self-reflection to iteratively solve complex tasks. Since these agents are all based on the same LLM, only conducting self-evaluation or removing underperforming agents does not substantively enhance the capability of the agents. We argue that a comprehensive evaluation and accumulating experience from evaluation feedback is an effective approach to improving system performance. In this paper, we propose Reusable Experience Accumulation with 360$^\circ$ Assessment (360$^\circ$REA), a hierarchical multi-agent framework inspired by corporate organizational practices. The framework employs a novel 360$^\circ$ performance assessment method for multi-perspective performance evaluation with fine-grained assessment. To enhance the capability of agents in addressing complex tasks, we introduce dual-level experience pool for agents to accumulate experience through fine-grained assessment. Extensive experiments on complex task datasets demonstrate the effectiveness of 360$^\circ$REA.
Related papers
- Exploring Reasoning Reward Model for Agents [30.458783880389216]
Agentic Reinforcement Learning (Agentic RL) has achieved notable success in enabling agents to perform complex reasoning and tool use.<n>Most methods still relies on sparse outcome-based reward for training.<n>We introduce Agent Reasoning Reward Model (Agent-RRM), a multi-faceted reward model that produces structured feedback for agentic trajectories.
arXiv Detail & Related papers (2026-01-29T18:59:52Z) - AgentEvolver: Towards Efficient Self-Evolving Agent System [51.54882384204726]
We present AgentEvolver, a self-evolving agent system that drives autonomous agent learning.<n>AgentEvolver introduces three synergistic mechanisms: self-questioning, self-navigating, and self-attributing.<n>Preliminary experiments indicate that AgentEvolver achieves more efficient exploration, better sample utilization, and faster adaptation compared to traditional RL-based baselines.
arXiv Detail & Related papers (2025-11-13T15:14:47Z) - A Comprehensive Empirical Evaluation of Agent Frameworks on Code-centric Software Engineering Tasks [14.762911285395047]
We evaluate seven general-purpose agent frameworks across three representative code-centric tasks.<n>Our findings reveal distinct capability patterns and trade-offs among the evaluated frameworks.<n>For overhead, software development incurs the highest monetary cost, while GPTswarm remains the most cost-efficient.
arXiv Detail & Related papers (2025-11-02T09:46:59Z) - Completion $\
eq$ Collaboration: Scaling Collaborative Effort with Agents [48.95020665909723]
We argue for a shift from building and assessing task completion agents to developing collaborative agents.<n>We introduce collaborative effort scaling, a framework that captures how an agent's utility grows with increasing user involvement.
arXiv Detail & Related papers (2025-10-29T17:47:18Z) - Beyond the Final Answer: Evaluating the Reasoning Trajectories of Tool-Augmented Agents [22.781523439717223]
A proper evaluation of an agent's performance must go beyond the final answer to also assess the problem-solving trajectory.<n>We introduce TRACE, a framework for the multi-dimensional evaluation of tool-augmented LLM agent performance.<n>Our results confirm that TRACE accurately evaluates these complex behaviors in a scalable and cost-effective manner.
arXiv Detail & Related papers (2025-10-03T09:19:15Z) - Establishing Best Practices for Building Rigorous Agentic Benchmarks [94.69724201080155]
We show that many agentic benchmarks have issues in task setup or reward design.<n>Such issues can lead to under- or overestimation of agents' performance by up to 100% in relative terms.<n>We introduce the Agentic Benchmark Checklist (ABC), a set of guidelines that we synthesized from our benchmark-building experience.
arXiv Detail & Related papers (2025-07-03T17:35:31Z) - An Adversary-Resistant Multi-Agent LLM System via Credibility Scoring [8.779871128906787]
We introduce a general and adversary-resistant multi-agent LLM framework based on credibility scoring.<n>Our system associates a credibility score that is used when aggregating the team outputs.
arXiv Detail & Related papers (2025-05-30T05:57:37Z) - Cross-Task Experiential Learning on LLM-based Multi-Agent Collaboration [63.90193684394165]
We introduce multi-agent cross-task experiential learning (MAEL), a novel framework that endows LLM-driven agents with explicit cross-task learning and experience accumulation.<n>During the experiential learning phase, we quantify the quality for each step in the task-solving workflow and store the resulting rewards.<n>During inference, agents retrieve high-reward, task-relevant experiences as few-shot examples to enhance the effectiveness of each reasoning step.
arXiv Detail & Related papers (2025-05-29T07:24:37Z) - On Multi-Agent Inverse Reinforcement Learning [8.284137254112848]
We extend the Inverse Reinforcement Learning (IRL) framework to the multi-agent setting, assuming to observe agents who are following Nash Equilibrium (NE) policies.
We provide an explicit characterization of the feasible reward set and analyze how errors in estimating the transition dynamics and expert behavior impact the recovered rewards.
arXiv Detail & Related papers (2024-11-22T16:31:36Z) - From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.
We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - Agent-as-a-Judge: Evaluate Agents with Agents [61.33974108405561]
We introduce the Agent-as-a-Judge framework, wherein agentic systems are used to evaluate agentic systems.
This is an organic extension of the LLM-as-a-Judge framework, incorporating agentic features that enable intermediate feedback for the entire task-solving process.
We present DevAI, a new benchmark of 55 realistic automated AI development tasks.
arXiv Detail & Related papers (2024-10-14T17:57:02Z) - Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement [50.481380478458945]
Iterative step-level Process Refinement (IPR) framework provides detailed step-by-step guidance to enhance agent training.
Our experiments on three complex agent tasks demonstrate that our framework outperforms a variety of strong baselines.
arXiv Detail & Related papers (2024-06-17T03:29:13Z) - Adaptive In-conversation Team Building for Language Model Agents [33.03550687362213]
Leveraging multiple large language model (LLM) agents has shown to be a promising approach for tackling complex tasks.
Our new adaptive team-building paradigm offers a flexible solution, realized through a novel agent design named Captain Agent.
A comprehensive evaluation across six real-world scenarios demonstrates that Captain Agent significantly outperforms existing multi-agent methods.
arXiv Detail & Related papers (2024-05-29T18:08:37Z) - Iterative Experience Refinement of Software-Developing Agents [81.09737243969758]
Large language models (LLMs) can leverage past experiences to reduce errors and enhance efficiency.
This paper introduces the Iterative Experience Refinement framework, enabling LLM agents to refine experiences iteratively during task execution.
arXiv Detail & Related papers (2024-05-07T11:33:49Z) - ReAct Meets ActRe: When Language Agents Enjoy Training Data Autonomy [47.42940885853956]
A$3$T is a framework that enables the Autonomous.
of Agent Trajectories in the style of ReAct.
In AlfWorld, the agent trained with A$3$T obtains a 1-shot success rate of 96%, and 100% success with 4 iterative rounds.
arXiv Detail & Related papers (2024-03-21T17:43:44Z) - AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents [76.95062553043607]
evaluating large language models (LLMs) is essential for understanding their capabilities and facilitating their integration into practical applications.
We introduce AgentBoard, a pioneering comprehensive benchmark and accompanied open-source evaluation framework tailored to analytical evaluation of LLM agents.
arXiv Detail & Related papers (2024-01-24T01:51:00Z) - Credit-cognisant reinforcement learning for multi-agent cooperation [0.0]
We introduce the concept of credit-cognisant rewards, which allows an agent to perceive the effect its actions had on the environment as well as on its co-agents.
We show that by manipulating these experiences and constructing the reward contained within them to include the rewards received by all the agents within the same action sequence, we are able to improve significantly on the performance of independent deep Q-learning.
arXiv Detail & Related papers (2022-11-18T09:00:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.