Chain-of-Memory: Lightweight Memory Construction with Dynamic Evolution for LLM Agents
- URL: http://arxiv.org/abs/2601.14287v1
- Date: Wed, 14 Jan 2026 04:42:15 GMT
- Title: Chain-of-Memory: Lightweight Memory Construction with Dynamic Evolution for LLM Agents
- Authors: Xiucheng Xu, Bingbing Xu, Xueyun Tian, Zihe Huang, Rongxin Chen, Yunfan Li, Huawei Shen,
- Abstract summary: External memory systems are pivotal for enabling Large Language Model (LLM) agents to maintain persistent knowledge and perform long-horizon decision-making.<n>Existing paradigms typically follow a two-stage process: computationally expensive memory construction followed by naive retrieval-augmented generation.<n>We propose CoM (Chain-of-Memory), a novel framework that advocates for a paradigm shift toward lightweight construction paired with sophisticated utilization.
- Score: 26.39049374286037
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: External memory systems are pivotal for enabling Large Language Model (LLM) agents to maintain persistent knowledge and perform long-horizon decision-making. Existing paradigms typically follow a two-stage process: computationally expensive memory construction (e.g., structuring data into graphs) followed by naive retrieval-augmented generation. However, our empirical analysis reveals two fundamental limitations: complex construction incurs high costs with marginal performance gains, and simple context concatenation fails to bridge the gap between retrieval recall and reasoning accuracy. To address these challenges, we propose CoM (Chain-of-Memory), a novel framework that advocates for a paradigm shift toward lightweight construction paired with sophisticated utilization. CoM introduces a Chain-of-Memory mechanism that organizes retrieved fragments into coherent inference paths through dynamic evolution, utilizing adaptive truncation to prune irrelevant noise. Extensive experiments on the LongMemEval and LoCoMo benchmarks demonstrate that CoM outperforms strong baselines with accuracy gains of 7.5%-10.4%, while drastically reducing computational overhead to approximately 2.7% of token consumption and 6.0% of latency compared to complex memory architectures.
Related papers
- HyMem: Hybrid Memory Architecture with Dynamic Retrieval Scheduling [7.24393498822329]
HyMem is a hybrid memory architecture that enables dynamic on-demand scheduling through multi-granular memory representations.<n>We show that HyMem achieves strong performance on both the LOCOMO and LongMemEval benchmarks, outperforming full-context while reducing computational cost by 92.6%.
arXiv Detail & Related papers (2026-02-15T00:06:19Z) - MemFly: On-the-Fly Memory Optimization via Information Bottleneck [35.420309099411874]
Long-term memory enables large language model agents to tackle complex tasks through historical interactions.<n>Existing frameworks encounter a dilemma between compressing redundant information efficiently and maintaining precise retrieval for downstream tasks.<n>MemFly is a framework grounded in information bottleneck principles that facilitates on-the-fly memory evolution for LLMs.<n>MemFly substantially outperforms state-of-the-art baselines in memory coherence, response fidelity, and accuracy.
arXiv Detail & Related papers (2026-02-08T09:37:25Z) - AMA: Adaptive Memory via Multi-Agent Collaboration [54.490349689939166]
We propose Adaptive Memory via Multi-Agent Collaboration (AMA), a novel framework that leverages coordinated agents to manage memory across multiple granularities.<n>AMA significantly outperforms state-of-the-art baselines while reducing token consumption by approximately 80% compared to full-context methods.
arXiv Detail & Related papers (2026-01-28T08:09:49Z) - MemRec: Collaborative Memory-Augmented Agentic Recommender System [57.548438733740504]
We propose MemRec, a framework that architecturally decouples reasoning from memory management.<n>MemRec introduces a dedicated LM_Mem to manage a dynamic collaborative memory graph.<n>It achieves state-of-the-art performance on four benchmarks.
arXiv Detail & Related papers (2026-01-13T18:51:16Z) - SimpleMem: Efficient Lifelong Memory for LLM Agents [73.74399447715052]
We introduce SimpleMem, an efficient memory framework based on semantic lossless compression.<n>We propose a three-stage pipeline designed to maximize information density and token utilization.<n> Experiments on benchmark datasets show that our method consistently outperforms baseline approaches in accuracy, retrieval efficiency, and inference cost.
arXiv Detail & Related papers (2026-01-05T21:02:49Z) - MemSearcher: Training LLMs to Reason, Search and Manage Memory via End-to-End Reinforcement Learning [73.27233666920618]
We propose MemSearcher, an agent workflow that iteratively maintains a compact memory and combines the current turn with it.<n>At each turn, MemSearcher fuses the user's question with the memory to generate reasoning traces, perform search actions, and update memory to retain only information essential for solving the task.<n>We introduce multi-context GRPO, an end-to-end RL framework that jointly optimize reasoning, search strategies, and memory management of MemSearcher Agents.
arXiv Detail & Related papers (2025-11-04T18:27:39Z) - Overflow Prevention Enhances Long-Context Recurrent LLMs [81.71585057993074]
A recent trend in LLMs is developing recurrent sub-quadratic models that improve long-context processing efficiency.<n>We investigate leading large long-context models, focusing on how their fixed-size recurrent memory affects their performance.<n>Our experiments reveal that, even when these models are trained for extended contexts, their use of long contexts remains underutilized.
arXiv Detail & Related papers (2025-05-12T17:45:05Z) - Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory [0.5584627289325719]
Large Language Models (LLMs) have demonstrated remarkable prowess in generating contextually coherent responses.<n>But their fixed context windows pose fundamental challenges for maintaining consistency over prolonged multi-session dialogues.<n>We introduce Mem0, a scalable memory-centric architecture that addresses this issue by dynamically extracting, consolidating, and retrieving salient information from ongoing conversations.
arXiv Detail & Related papers (2025-04-28T01:46:35Z) - Rethinking Token Reduction for State Space Models [47.00760373683448]
We propose a tailored, unified post-training token reduction method for State Space Models (SSMs)
Our approach integrates token importance and similarity, thus taking advantage of both pruning and merging.
Our method improves the average accuracy by 5.7% to 13.1% on six benchmarks with Mamba-2 compared to existing methods.
arXiv Detail & Related papers (2024-10-16T00:06:13Z) - ThinK: Thinner Key Cache by Query-Driven Pruning [63.13363917871414]
Large Language Models (LLMs) have revolutionized the field of natural language processing, achieving unprecedented performance across a variety of applications.<n>This paper focuses on the long-context scenario, addressing the inefficiencies in KV cache memory consumption during inference.<n>We propose ThinK, a novel query-dependent KV cache pruning method designed to minimize attention weight loss while selectively pruning the least significant channels.
arXiv Detail & Related papers (2024-07-30T17:59:08Z) - SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios.
In the early route, intermediate outputs are consolidated via an anti-redundancy operation.
In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.