SAMULE: Self-Learning Agents Enhanced by Multi-level Reflection
- URL: http://arxiv.org/abs/2509.20562v1
- Date: Wed, 24 Sep 2025 21:02:15 GMT
- Title: SAMULE: Self-Learning Agents Enhanced by Multi-level Reflection
- Authors: Yubin Ge, Salvatore Romeo, Jason Cai, Monica Sunkara, Yi Zhang,
- Abstract summary: SAMULE is a new framework for self-learning agents powered by a retrospective language model that is trained based on Multi-Level Reflection Synthesis.<n>It first synthesizes high-quality reflections across three complementary levels: Single-Trajectory Learning (micro-level) for detailed error correction; Intra-Task Learning (meso-level) to build error across multiple trials of the same task, and Inter-Task Learning (macro-level) to extract transferable insights based on same typed errors from diverse task failures.
- Score: 14.40651157974557
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the rapid advancements in LLM agents, they still face the challenge of generating meaningful reflections due to inadequate error analysis and a reliance on rare successful trajectories, especially in complex tasks. In this work, we propose SAMULE, a new framework for self-learning agents powered by a retrospective language model that is trained based on Multi-Level Reflection Synthesis. It first synthesizes high-quality reflections across three complementary levels: Single-Trajectory Learning (micro-level) for detailed error correction; Intra-Task Learning (meso-level) to build error taxonomies across multiple trials of the same task, and Inter-Task Learning (macro-level) to extract transferable insights based on same typed errors from diverse task failures. Then we fine-tune a language model serving as the retrospective model to generate reflections during inference. We further extend our framework to interactive settings through a foresight-based reflection mechanism, enabling agents to proactively reflect and adapt during user interactions by comparing predicted and actual responses. Extensive experiments on three challenging benchmarks - TravelPlanner, NATURAL PLAN, and Tau-bench - demonstrate that our approach significantly outperforms reflection-based baselines. Our results highlight the critical role of well-designed reflection synthesis and failure-centric learning in building self-improving LLM agents.
Related papers
- Learning from Trials and Errors: Reflective Test-Time Planning for Embodied LLMs [63.88783817420284]
Embodied robots cannot reflect on what went wrong or why, turning deployment into a sequence of independent trials.<n>We introduce Reflective Test-Time Planning, which integrates two modes of reflection: textitreflection-in-action and textitreflection-on-action<n>We also include retrospective reflection, allowing the agent to re-evaluate earlier decisions and perform model updates with hindsight.
arXiv Detail & Related papers (2026-02-24T18:55:18Z) - Teaching Large Reasoning Models Effective Reflection [62.73646680747003]
Large Reasoning Models (LRMs) have recently shown impressive performance on complex reasoning tasks.<n>However, not all reflections are beneficial-many are superficial, offering little to no improvement over the original answer.<n>We first propose Self-Critique Fine-Tuning (SCFT), a training framework that enhances the model's reflective reasoning ability using only self-generated critiques.
arXiv Detail & Related papers (2026-01-19T04:51:53Z) - ReflexGrad: Three-Way Synergistic Architecture for Zero-Shot Generalization in LLM Agents [0.0]
We introduce ReflexGrad, a novel architecture that tightly couples three complementary mechanisms.<n>Our system achieves true zero-shot generalization through pure semantic reasoning.<n>Our work demonstrates that synergistic integration of complementary learning mechanisms enables robust zero-shot generalization.
arXiv Detail & Related papers (2025-11-18T15:25:05Z) - LANPO: Bootstrapping Language and Numerical Feedback for Reinforcement Learning in LLMs [73.27182315028021]
LANPO is a framework that cleanly separates the roles of feedback: language guides exploration, while numerical rewards drive optimization.<n>Our work provides a robust method for integrating historical experiences into the LLM RL loop, creating more effective and data-efficient learning agents.
arXiv Detail & Related papers (2025-10-18T15:51:19Z) - SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning [25.02860760920562]
Multimodal large language models (MLLMs) have shown promising capabilities in reasoning tasks, but struggle with complex problems requiring explicit self-reflection and self-correction.<n>Existing reflection methods are simplistic and struggle to generate meaningful and instructive feedback.<n>We propose Multimodal Self-Reflection enhanced reasoning with Group Relative Policy Optimization (SRPO), a two-stage reflection-aware reinforcement learning framework.
arXiv Detail & Related papers (2025-06-02T14:21:44Z) - MIRROR: Multi-agent Intra- and Inter-Reflection for Optimized Reasoning in Tool Learning [33.009759731505746]
Complex tasks involving tool integration pose significant challenges for Large Language Models.<n> Reflection has emerged as an effective strategy for correcting erroneous trajectories in agentic benchmarks.<n>We propose MIRROR, a framework that consists of both intra-reflection, which critically assesses intended actions before execution, and inter-reflection, which further adjusts the trajectory.
arXiv Detail & Related papers (2025-05-27T03:37:33Z) - ReflectEvo: Improving Meta Introspection of Small LLMs by Learning Self-Reflection [60.75785864719726]
We present a novel pipeline, ReflectEvo, to demonstrate that small language models (SLMs) can enhance meta introspection through reflection learning.<n>We construct ReflectEvo-460k, a large-scale, comprehensive, self-generated reflection dataset with broadened instructions and diverse multi-domain tasks.
arXiv Detail & Related papers (2025-05-22T10:03:05Z) - ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning [53.817538122688944]
We introduce Reinforced Meta-thinking Agents (ReMA) to elicit meta-thinking behaviors from Reasoning of Large Language Models (LLMs)<n>ReMA decouples the reasoning process into two hierarchical agents: a high-level meta-thinking agent responsible for generating strategic oversight and plans, and a low-level reasoning agent for detailed executions.<n> Empirical results from single-turn experiments demonstrate that ReMA outperforms single-agent RL baselines on complex reasoning tasks.
arXiv Detail & Related papers (2025-03-12T16:05:31Z) - Instruct-of-Reflection: Enhancing Large Language Models Iterative Reflection Capabilities via Dynamic-Meta Instruction [11.838351314880736]
Instruct-of-Reflection (IoRT) is a novel and general reflection framework that leverages dynamic-meta instruction to enhance the iterative reflection capability of Large Language Models (LLMs)<n>Our experiments demonstrate that IoRT achieves an average improvement of 10.1% over established baselines in mathematical and commonsense reasoning tasks.
arXiv Detail & Related papers (2025-03-02T14:02:03Z) - Meta-Reflection: A Feedback-Free Reflection Learning Framework [57.14485943991588]
We propose Meta-Reflection, a feedback-free reflection mechanism that requires only a single inference pass without external feedback.<n>Motivated by the human ability to remember and retrieve reflections from past experiences, Meta-Reflection integrates reflective insights into a codebook.<n>To thoroughly investigate and evaluate the practicality of Meta-Reflection in real-world scenarios, we introduce an industrial e-commerce benchmark named E-commerce Customer Intent Detection.
arXiv Detail & Related papers (2024-12-18T12:20:04Z) - RAG-Modulo: Solving Sequential Tasks using Experience, Critics, and Language Models [5.0741409008225755]
Large language models (LLMs) have emerged as promising tools for solving challenging robotic tasks.
Most existing LLM-based agents lack the ability to retain and learn from past interactions.
We propose RAG-Modulo, a framework that enhances LLM-based agents with a memory of past interactions and incorporates critics to evaluate the agents' decisions.
arXiv Detail & Related papers (2024-09-18T20:03:32Z) - Self-RAG: Learning to Retrieve, Generate, and Critique through
Self-Reflection [74.51523859064802]
We introduce a new framework called Self-Reflective Retrieval-Augmented Generation (Self-RAG)
Self-RAG enhances an LM's quality and factuality through retrieval and self-reflection.
It significantly outperforms state-of-the-art LLMs and retrieval-augmented models on a diverse set of tasks.
arXiv Detail & Related papers (2023-10-17T18:18:32Z) - Improving Open Information Extraction with Large Language Models: A
Study on Demonstration Uncertainty [52.72790059506241]
Open Information Extraction (OIE) task aims at extracting structured facts from unstructured text.
Despite the potential of large language models (LLMs) like ChatGPT as a general task solver, they lag behind state-of-the-art (supervised) methods in OIE tasks.
arXiv Detail & Related papers (2023-09-07T01:35:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.