Related papers: H$^2$R: Hierarchical Hindsight Reflection for Multi-Task LLM Agents

H$^2$R: Hierarchical Hindsight Reflection for Multi-Task LLM Agents

URL: http://arxiv.org/abs/2509.12810v1
Date: Tue, 16 Sep 2025 08:30:08 GMT
Title: H$^2$R: Hierarchical Hindsight Reflection for Multi-Task LLM Agents
Authors: Shicheng Ye, Chao Yu, Kaiqiang Ke, Chengdong Xu, Yinqi Wei,
Abstract summary: Large language model (LLM)-based agents have shown strong potential in multi-task scenarios.<n>Existing approaches often treat prior experiences and knowledge as monolithic units, leading to inefficient and coarse-grained knowledge transfer.<n>We propose a novel hierarchical memory architecture that enables fine-grained knowledge transfer.
Score: 3.9054156855794973
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language model (LLM)-based agents have shown strong potential in multi-task scenarios, owing to their ability to transfer knowledge across diverse tasks. However, existing approaches often treat prior experiences and knowledge as monolithic units, leading to inefficient and coarse-grained knowledge transfer. In this work, we propose a novel hierarchical memory architecture that enables fine-grained knowledge transfer by decoupling high-level planning memory from low-level execution memory. To construct and refine these hierarchical memories, we introduce Hierarchical Hindsight Reflection (H$^2$R), a mechanism that distills reusable and hierarchical knowledge from past agent-environment interactions. At test time, H$^2$R performs retrievals of high-level and low-level memories separately, allowing LLM-based agents to efficiently access and utilize task-relevant knowledge for new tasks.Experimental results across two benchmarks demonstrate that H$^2$R can improve generalization and decision-making performance, outperforming prior baselines such as Expel.

Related papers

Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory [89.65731902036669]
Evo-Memory is a streaming benchmark and framework for evaluating self-evolving memory in large language model (LLM) agents.<n>We evaluate over ten representative memory modules and evaluate them across 10 diverse multi-turn goal-oriented and single-turn reasoning and QA datasets.
arXiv Detail & Related papers (2025-11-25T21:08:07Z)
A Benchmark for Procedural Memory Retrieval in Language Agents [0.023227405857540805]
Current AI agents excel in familiar settings, but fail sharply when faced with novel tasks with unseenProc.<n>We present the first benchmark that isolates procedural memory retrieval from task execution.<n>Our results expose a clear generalization cliff: embedding-based methods perform strongly on familiar contexts, yet degrade considerably on novel ones.
arXiv Detail & Related papers (2025-11-21T08:08:53Z)
RECALL: REpresentation-aligned Catastrophic-forgetting ALLeviation via Hierarchical Model Merging [33.22889542330089]
Internal representations in large language models (LLMs) serve as reliable proxies of learned knowledge.<n>We propose RECALL, a representation-aware model merging framework for continual learning without access to historical data.
arXiv Detail & Related papers (2025-10-23T12:17:37Z)
ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory [57.517214479414726]
ReasoningBank is a memory framework that distills generalizable reasoning strategies from an agent's self-judged successful and failed experiences.<n>At test time, an agent retrieves relevant memories from ReasoningBank to inform its interaction and then integrates new learnings back, enabling it to become more capable over time.<n>We introduce memory-aware test-time scaling (MaTTS), which accelerates and diversifies this learning process by scaling up the agent's interaction experience.
arXiv Detail & Related papers (2025-09-29T17:51:03Z)
Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning [59.16831804985279]
Large Language Models (LLMs) have demonstrated impressive capabilities across a wide range of NLP tasks, but they remain fundamentally stateless.<n>Recent efforts to address this limitation often augment LLMs with an external memory bank, yet most existing pipelines are static and learned.<n>We present Memory-R1, a reinforcement learning framework that equips LLMs with the ability to actively manage and utilize external memory.
arXiv Detail & Related papers (2025-08-27T12:26:55Z)
Hierarchical Memory for High-Efficiency Long-Term Reasoning in LLM Agents [19.04968632268433]
We propose a hierarchical memory architecture for Large Language Model Agents (LLM Agents)<n>Each memory vector is embedded with a positional index encoding pointing to its semantically related sub-memories in the next layer.<n>During the reasoning phase, an index-based routing mechanism enables efficient, layer-by-layer retrieval without performing exhaustive similarity computations.
arXiv Detail & Related papers (2025-07-23T12:45:44Z)
Re-ranking Reasoning Context with Tree Search Makes Large Vision-Language Models Stronger [51.01841635655944]
Recent advancements in Large Vision Language Models (LVLMs) have significantly improved performance in Visual Question Answering (VQA) tasks.<n>Existing methods still face challenges, such as the scarcity of knowledge with reasoning examples and erratic responses from retrieved knowledge.<n>We propose a multimodal RAG framework, termed RCTS, which enhances LVLMs by constructing a Reasoning Context-enriched knowledge base and a Tree Search re-ranking method.
arXiv Detail & Related papers (2025-06-09T14:00:57Z)
Learning to Route Queries Across Knowledge Bases for Step-wise Retrieval-Augmented Reasoning [60.84901522792042]
Multimodal Retrieval-Augmented Generation (MRAG) has shown promise in mitigating hallucinations in Multimodal Large Language Models (MLLMs)<n>We propose R1, a novel MRAG framework that learns to decide when and where to retrieve knowledge based on the evolving reasoning state.<n>R1- can adaptively and effectively leverage diverse KBs, reducing unnecessary retrievals and improving both efficiency and accuracy.
arXiv Detail & Related papers (2025-05-28T08:17:57Z)
Efficiently Enhancing General Agents With Hierarchical-categorical Memory [0.5919433278490629]
We introduce EHC, a general agent capable of learning without parameter updates.<n>EHC consists of a Hierarchical Memory Retrieval (HMR) module and a Task-Category Oriented Experience Learning (TOEL) module.
arXiv Detail & Related papers (2025-05-28T06:12:51Z)
From RAG to Memory: Non-Parametric Continual Learning for Large Language Models [6.380729797938521]
retrieval-augmented generation (RAG) has become the dominant way to introduce new information.<n>Recent RAG approaches augment vector embeddings with various structures like knowledge graphs to address some gaps, namely sense-making and associativity.<n>We propose HippoRAG 2, a framework that outperforms standard RAG comprehensively on factual, sense-making, and associative memory tasks.
arXiv Detail & Related papers (2025-02-20T18:26:02Z)
mR$^2$AG: Multimodal Retrieval-Reflection-Augmented Generation for Knowledge-Based VQA [78.45521005703958]
multimodal Retrieval-Augmented Generation (mRAG) is naturally introduced to provide MLLMs with comprehensive and up-to-date knowledge. We propose a novel framework called textbfRetrieval-textbfReftextbfAugmented textbfGeneration (mR$2$AG) which achieves adaptive retrieval and useful information localization. mR$2$AG significantly outperforms state-of-the-art MLLMs on INFOSEEK and Encyclopedic-VQA
arXiv Detail & Related papers (2024-11-22T16:15:50Z)
MemoRAG: Boosting Long Context Processing with Global Memory-Enhanced Retrieval Augmentation [60.04380907045708]
Retrieval-Augmented Generation (RAG) is considered a promising strategy to address this problem.<n>We propose MemoRAG, a novel RAG framework empowered by global memory-augmented retrieval.<n>MemoRAG achieves superior performances across a variety of long-context evaluation tasks.
arXiv Detail & Related papers (2024-09-09T13:20:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.