Related papers: How Memory Management Impacts LLM Agents: An Empirical Study of Experience-Following Behavior

How Memory Management Impacts LLM Agents: An Empirical Study of Experience-Following Behavior

URL: http://arxiv.org/abs/2505.16067v1
Date: Wed, 21 May 2025 22:35:01 GMT
Title: How Memory Management Impacts LLM Agents: An Empirical Study of Experience-Following Behavior
Authors: Zidi Xiong, Yuping Lin, Wenya Xie, Pengfei He, Jiliang Tang, Himabindu Lakkaraju, Zhen Xiang,
Abstract summary: Memory is a critical component in large language model (LLM)-based agents.<n>We study how memory management choices impact the LLM agents' behavior, especially their long-term performance.
Score: 49.62361184944454
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Memory is a critical component in large language model (LLM)-based agents, enabling them to store and retrieve past executions to improve task performance over time. In this paper, we conduct an empirical study on how memory management choices impact the LLM agents' behavior, especially their long-term performance. Specifically, we focus on two fundamental memory operations that are widely used by many agent frameworks-addition, which incorporates new experiences into the memory base, and deletion, which selectively removes past experiences-to systematically study their impact on the agent behavior. Through our quantitative analysis, we find that LLM agents display an experience-following property: high similarity between a task input and the input in a retrieved memory record often results in highly similar agent outputs. Our analysis further reveals two significant challenges associated with this property: error propagation, where inaccuracies in past experiences compound and degrade future performance, and misaligned experience replay, where outdated or irrelevant experiences negatively influence current tasks. Through controlled experiments, we show that combining selective addition and deletion strategies can help mitigate these negative effects, yielding an average absolute performance gain of 10% compared to naive memory growth. Furthermore, we highlight how memory management choices affect agents' behavior under challenging conditions such as task distribution shifts and constrained memory resources. Our findings offer insights into the behavioral dynamics of LLM agent memory systems and provide practical guidance for designing memory components that support robust, long-term agent performance. We also release our code to facilitate further study.

Related papers

MemBench: Towards More Comprehensive Evaluation on the Memory of LLM-based Agents [26.647812147336538]
We construct a more comprehensive dataset and benchmark to evaluate the memory capability of LLM-based agents.<n>Our dataset incorporates factual memory and reflective memory as different levels, and proposes participation and observation as various interactive scenarios.<n>Based on our dataset, we present a benchmark, named MemBench, to evaluate the memory capability of LLM-based agents from multiple aspects, including their effectiveness, efficiency, and capacity.
arXiv Detail & Related papers (2025-06-20T10:09:23Z)
FindingDory: A Benchmark to Evaluate Memory in Embodied Agents [49.89792845476579]
We introduce a new benchmark for long-range embodied tasks in the Habitat simulator.<n>This benchmark evaluates memory-based capabilities across 60 tasks requiring sustained engagement and contextual awareness.
arXiv Detail & Related papers (2025-06-18T17:06:28Z)
LifelongAgentBench: Evaluating LLM Agents as Lifelong Learners [51.518410910148816]
Current large language model (LLM)-based agents, however, remain stateless and unable to accumulate or transfer knowledge over time.<n>We present LifelongAgentBench, the first unified benchmark designed to systematically assess the lifelong learning ability of LLM agents.
arXiv Detail & Related papers (2025-05-17T10:09:11Z)
MemInsight: Autonomous Memory Augmentation for LLM Agents [12.620141762922168]
We propose an autonomous memory augmentation approach, MemInsight, to enhance semantic data representation and retrieval mechanisms.<n>We empirically validate the efficacy of our proposed approach in three task scenarios; conversational recommendation, question answering and event summarization.
arXiv Detail & Related papers (2025-03-27T17:57:28Z)
Exploring Knowledge Boundaries in Large Language Models for Retrieval Judgment [56.87031484108484]
Large Language Models (LLMs) are increasingly recognized for their practical applications. Retrieval-Augmented Generation (RAG) tackles this challenge and has shown a significant impact on LLMs. By minimizing retrieval requests that yield neutral or harmful results, we can effectively reduce both time and computational costs.
arXiv Detail & Related papers (2024-11-09T15:12:28Z)
Stable Hadamard Memory: Revitalizing Memory-Augmented Agents for Reinforcement Learning [64.93848182403116]
Current deep-learning memory models struggle in reinforcement learning environments that are partially observable and long-term. We introduce the Stable Hadamard Memory, a novel memory model for reinforcement learning agents. Our approach significantly outperforms state-of-the-art memory-based methods on challenging partially observable benchmarks.
arXiv Detail & Related papers (2024-10-14T03:50:17Z)
RAG-Modulo: Solving Sequential Tasks using Experience, Critics, and Language Models [5.0741409008225755]
Large language models (LLMs) have emerged as promising tools for solving challenging robotic tasks. Most existing LLM-based agents lack the ability to retain and learn from past interactions. We propose RAG-Modulo, a framework that enhances LLM-based agents with a memory of past interactions and incorporates critics to evaluate the agents' decisions.
arXiv Detail & Related papers (2024-09-18T20:03:32Z)
HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language Model [39.169389255970806]
HiAgent is a framework that leverages subgoals as memory chunks to manage the working memory of Large Language Model (LLM)-based agents hierarchically. Results show that HiAgent achieves a twofold increase in success rate and reduces the average number of steps required by 3.8.
arXiv Detail & Related papers (2024-08-18T17:59:49Z)
A Survey on the Memory Mechanism of Large Language Model based Agents [66.4963345269611]
Large language model (LLM) based agents have recently attracted much attention from the research and industry communities. LLM-based agents are featured in their self-evolving capability, which is the basis for solving real-world problems. The key component to support agent-environment interactions is the memory of the agents.
arXiv Detail & Related papers (2024-04-21T01:49:46Z)
ActiveRAG: Autonomously Knowledge Assimilation and Accommodation through Retrieval-Augmented Agents [49.30553350788524]
Retrieval-Augmented Generation (RAG) enables Large Language Models (LLMs) to leverage external knowledge. Existing RAG models often treat LLMs as passive recipients of information. We introduce ActiveRAG, a multi-agent framework that mimics human learning behavior.
arXiv Detail & Related papers (2024-02-21T06:04:53Z)
Think Before You Act: Decision Transformers with Working Memory [44.18926449252084]
Decision Transformer-based decision-making agents have shown the ability to generalize across multiple tasks. We argue that this inefficiency stems from the forgetting phenomenon, in which a model memorizes its behaviors in parameters throughout training. We propose a working memory module to store, blend, and retrieve information for different downstream tasks.
arXiv Detail & Related papers (2023-05-24T01:20:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.