MIRROR: Multi-agent Intra- and Inter-Reflection for Optimized Reasoning in Tool Learning
- URL: http://arxiv.org/abs/2505.20670v2
- Date: Thu, 05 Jun 2025 09:49:45 GMT
- Title: MIRROR: Multi-agent Intra- and Inter-Reflection for Optimized Reasoning in Tool Learning
- Authors: Zikang Guo, Benfeng Xu, Xiaorui Wang, Zhendong Mao,
- Abstract summary: Complex tasks involving tool integration pose significant challenges for Large Language Models.<n> Reflection has emerged as an effective strategy for correcting erroneous trajectories in agentic benchmarks.<n>We propose MIRROR, a framework that consists of both intra-reflection, which critically assesses intended actions before execution, and inter-reflection, which further adjusts the trajectory.
- Score: 33.009759731505746
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Complex tasks involving tool integration pose significant challenges for Large Language Models (LLMs), leading to the emergence of multi-agent workflows as a promising solution. Reflection has emerged as an effective strategy for correcting erroneous trajectories in agentic workflows. However, existing approaches only exploit such capability in the post-action stage, where the agent observes the execution outcomes. We argue that, like humans, LLMs can also engage in reflection before action execution: the agent can anticipate undesirable outcomes from its own decisions, which not only provides a necessarily complementary perspective to evaluate the decision but also prevents the propagation of errors throughout the trajectory. In this paper, we propose MIRROR, a framework that consists of both intra-reflection, which critically assesses intended actions before execution, and inter-reflection, which further adjusts the trajectory based on observations. This design systematically leverages LLM reflection capabilities to eliminate and rectify erroneous actions on a more comprehensive scope. Evaluations on both the StableToolBench and TravelPlanner benchmarks demonstrate MIRROR's superior performance, achieving state-of-the-art results compared to existing approaches.
Related papers
- Learning from Trials and Errors: Reflective Test-Time Planning for Embodied LLMs [63.88783817420284]
Embodied robots cannot reflect on what went wrong or why, turning deployment into a sequence of independent trials.<n>We introduce Reflective Test-Time Planning, which integrates two modes of reflection: textitreflection-in-action and textitreflection-on-action<n>We also include retrospective reflection, allowing the agent to re-evaluate earlier decisions and perform model updates with hindsight.
arXiv Detail & Related papers (2026-02-24T18:55:18Z) - Beyond the Final Answer: Evaluating the Reasoning Trajectories of Tool-Augmented Agents [22.781523439717223]
A proper evaluation of an agent's performance must go beyond the final answer to also assess the problem-solving trajectory.<n>We introduce TRACE, a framework for the multi-dimensional evaluation of tool-augmented LLM agent performance.<n>Our results confirm that TRACE accurately evaluates these complex behaviors in a scalable and cost-effective manner.
arXiv Detail & Related papers (2025-10-03T09:19:15Z) - StepORLM: A Self-Evolving Framework With Generative Process Supervision For Operations Research Language Models [18.500046072165254]
We introduce StepORLM, a novel self-evolving framework with generative process supervision.<n>At its core, StepORLM features a co-evolutionary loop where a policy model and a generative process reward model (GenPRM) iteratively improve on each other.
arXiv Detail & Related papers (2025-09-26T16:39:10Z) - SAMULE: Self-Learning Agents Enhanced by Multi-level Reflection [14.40651157974557]
SAMULE is a new framework for self-learning agents powered by a retrospective language model that is trained based on Multi-Level Reflection Synthesis.<n>It first synthesizes high-quality reflections across three complementary levels: Single-Trajectory Learning (micro-level) for detailed error correction; Intra-Task Learning (meso-level) to build error across multiple trials of the same task, and Inter-Task Learning (macro-level) to extract transferable insights based on same typed errors from diverse task failures.
arXiv Detail & Related papers (2025-09-24T21:02:15Z) - Who Sees What? Structured Thought-Action Sequences for Epistemic Reasoning in LLMs [1.090218572228214]
This study investigates the potential of structured examples to improve the performance of LLM-based agents within a ReAct framework.<n>We propose a structured solution-processing pipeline that generates three categories of examples: optimal goal paths (G-type), informative node paths (E-type) and step-by-step optimal decision sequences contrasting alternative actions (L-type)<n>While L-type examples slightly reduce clarification requests and overall action steps, they do not yield consistent improvements.
arXiv Detail & Related papers (2025-08-20T09:36:53Z) - Agentic Reinforced Policy Optimization [66.96989268893932]
Large-scale reinforcement learning with verifiable rewards (RLVR) has demonstrated its effectiveness in harnessing the potential of large language models (LLMs) for single-turn reasoning tasks.<n>Current RL algorithms inadequately balance the models' intrinsic long-horizon reasoning capabilities and their proficiency in multi-turn tool interactions.<n>We propose Agentic Reinforced Policy Optimization (ARPO), a novel agentic RL algorithm tailored for training multi-turn LLM-based agents.
arXiv Detail & Related papers (2025-07-26T07:53:11Z) - SAND: Boosting LLM Agents with Self-Taught Action Deliberation [53.732649189709285]
Large Language Model (LLM) agents are commonly tuned with supervised finetuning on ReAct-style expert trajectories or preference optimization over pairwise rollouts.<n>We propose Self-taught ActioN Deliberation (SAND) framework, enabling LLM agents to explicitly deliberate over candidate actions before committing to one.<n>SAND achieves an average 20% improvement over initial supervised finetuning and also outperforms state-of-the-art agent tuning approaches.
arXiv Detail & Related papers (2025-07-10T05:38:15Z) - ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning [53.817538122688944]
We introduce Reinforced Meta-thinking Agents (ReMA) to elicit meta-thinking behaviors from Reasoning of Large Language Models (LLMs)<n>ReMA decouples the reasoning process into two hierarchical agents: a high-level meta-thinking agent responsible for generating strategic oversight and plans, and a low-level reasoning agent for detailed executions.<n> Empirical results from single-turn experiments demonstrate that ReMA outperforms single-agent RL baselines on complex reasoning tasks.
arXiv Detail & Related papers (2025-03-12T16:05:31Z) - Self-Corrective Task Planning by Inverse Prompting with Large Language Models [9.283971287618261]
We introduce InversePrompt, a novel self-corrective task planning approach.<n>Our method incorporates reasoning steps to provide clear, interpretable feedback.<n>Results on benchmark datasets show an average 16.3% higher success rate over existing LLM-based task planning methods.
arXiv Detail & Related papers (2025-03-10T13:35:51Z) - Enhancing Relation Extraction via Supervised Rationale Verification and Feedback [12.687458877141934]
We propose a novel automated feedback framework for relation extraction.<n>It presents a rationale supervisor to verify the rationale and provides re-selected demonstrations as feedback to correct the initial prediction.<n>Our proposed framework significantly outperforms existing methods.
arXiv Detail & Related papers (2024-12-10T08:18:29Z) - Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement [50.481380478458945]
Iterative step-level Process Refinement (IPR) framework provides detailed step-by-step guidance to enhance agent training.
Our experiments on three complex agent tasks demonstrate that our framework outperforms a variety of strong baselines.
arXiv Detail & Related papers (2024-06-17T03:29:13Z) - Devil's Advocate: Anticipatory Reflection for LLM Agents [53.897557605550325]
Our approach prompts LLM agents to decompose a given task into manageable subtasks.
We implement a three-fold introspective intervention:.
Anticipatory reflection on potential failures and alternative remedy before action execution.
Post-action alignment with subtask objectives and backtracking with remedy to ensure utmost effort in plan execution.
arXiv Detail & Related papers (2024-05-25T19:20:15Z) - Learning to Use Tools via Cooperative and Interactive Agents [58.77710337157665]
Tool learning empowers large language models (LLMs) as agents to use external tools and extend their utility.
We propose ConAgents, a Cooperative and interactive Agents framework, which coordinates three specialized agents for tool selection, tool execution, and action calibration separately.
Our experiments on three datasets show that the LLMs, when equipped with ConAgents, outperform baselines with substantial improvement.
arXiv Detail & Related papers (2024-03-05T15:08:16Z) - Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents [49.85633804913796]
We present an exploration-based trajectory optimization approach, referred to as ETO.
This learning method is designed to enhance the performance of open LLM agents.
Our experiments on three complex tasks demonstrate that ETO consistently surpasses baseline performance by a large margin.
arXiv Detail & Related papers (2024-03-04T21:50:29Z) - AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents [74.16170899755281]
We introduce AgentBoard, a pioneering comprehensive benchmark and accompanied open-source evaluation framework tailored to analytical evaluation of LLM agents.<n>AgentBoard offers a fine-grained progress rate metric that captures incremental advancements as well as a comprehensive evaluation toolkit.<n>This not only sheds light on the capabilities and limitations of LLM agents but also propels the interpretability of their performance to the forefront.
arXiv Detail & Related papers (2024-01-24T01:51:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.