Learning to Remember: End-to-End Training of Memory Agents for Long-Context Reasoning
- URL: http://arxiv.org/abs/2602.18493v1
- Date: Fri, 13 Feb 2026 16:54:23 GMT
- Title: Learning to Remember: End-to-End Training of Memory Agents for Long-Context Reasoning
- Authors: Kehao Zhang, Shangtong Gui, Sheng Yang, Wei Chen, Yang Feng,
- Abstract summary: We propose an end-to-end reinforcement learning framework that unifies memory operations and question answering within a single policy.<n>UMA maintains a dual memory representation: a compact core summary for global context and a structured Memory Bank that supports explicit CRUD.<n>Across 13 datasets spanning Ledger-QA, Test-Time Learning, and Accurate Retrieval, UMA substantially outperforms long-context and RAG baselines on dynamic reasoning and learning tasks.
- Score: 18.621823772319154
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Long-context LLMs and Retrieval-Augmented Generation (RAG) systems process information passively, deferring state tracking, contradiction resolution, and evidence aggregation to query time, which becomes brittle under ultra long streams with frequent updates. We propose the Unified Memory Agent (UMA), an end-to-end reinforcement learning framework that unifies memory operations and question answering within a single policy. UMA maintains a dual memory representation: a compact core summary for global context and a structured Memory Bank that supports explicit CRUD (create, update, delete, reorganize) over key value entries, enabling proactive consolidation during streaming. To evaluate long-horizon memory behavior, we introduce Ledger-QA, a diagnostic benchmark for continuous state tracking where answers are latent values derived from accumulated updates rather than lo cal span retrieval. Across 13 datasets spanning Ledger-QA, Test-Time Learning, and Accurate Retrieval, UMA substantially outperforms long-context and RAG baselines on dynamic reasoning and learning tasks while remaining competitive on standard retrieval benchmarks, underscoring the importance of learned, end-to-end memory management.
Related papers
- AMA: Adaptive Memory via Multi-Agent Collaboration [54.490349689939166]
We propose Adaptive Memory via Multi-Agent Collaboration (AMA), a novel framework that leverages coordinated agents to manage memory across multiple granularities.<n>AMA significantly outperforms state-of-the-art baselines while reducing token consumption by approximately 80% compared to full-context methods.
arXiv Detail & Related papers (2026-01-28T08:09:49Z) - Continuum Memory Architectures for Long-Horizon LLM Agents [0.0]
Retrieval-augmented generation (RAG) has become the default strategy for providing large language model (LLM) agents with contextual knowledge.<n>We define the textitContinuum Memory Architecture (CMA), a class of systems that maintain and update internal state across interactions.<n>We show consistent behavioral advantages on tasks that expose RAG's structural inability to accumulate, mutate, or disambiguate memory.
arXiv Detail & Related papers (2026-01-14T22:40:35Z) - Agentic Memory: Learning Unified Long-Term and Short-Term Memory Management for Large Language Model Agents [57.38404718635204]
Large language model (LLM) agents face fundamental limitations in long-horizon reasoning due to finite context windows.<n>Existing methods typically handle long-term memory (LTM) and short-term memory (STM) as separate components.<n>We propose Agentic Memory (AgeMem), a unified framework that integrates LTM and STM management directly into the agent's policy.
arXiv Detail & Related papers (2026-01-05T08:24:16Z) - Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory [89.65731902036669]
Evo-Memory is a streaming benchmark and framework for evaluating self-evolving memory in large language model (LLM) agents.<n>We evaluate over ten representative memory modules and evaluate them across 10 diverse multi-turn goal-oriented and single-turn reasoning and QA datasets.
arXiv Detail & Related papers (2025-11-25T21:08:07Z) - Evaluating Long-Term Memory for Long-Context Question Answering [100.1267054069757]
We present a systematic evaluation of memory-augmented methods using LoCoMo, a benchmark of synthetic long-context dialogues annotated for question-answering tasks.<n>Our findings show that memory-augmented approaches reduce token usage by over 90% while maintaining competitive accuracy.
arXiv Detail & Related papers (2025-10-27T18:03:50Z) - SEDM: Scalable Self-Evolving Distributed Memory for Agents [23.182291416527764]
SEDM is a verifiable and adaptive framework that transforms memory from a passive repository into an active, self-optimizing component.<n>We show that SEDM improves reasoning accuracy while reducing token overhead compared with strong memory baselines.<n>Results highlight SEDM as a scalable and sustainable memory mechanism for open-ended multi-agent collaboration.
arXiv Detail & Related papers (2025-09-11T14:37:37Z) - Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions [22.190297901876278]
We identify four core competencies essential for memory agents: accurate retrieval, test-time learning, long-range understanding, and selective forgetting.<n>Existing benchmarks either rely on limited context lengths or are tailored for static, long-context settings like book-based QA.<n>We introduce MemoryAgentBench, a new benchmark specifically designed for memory agents.
arXiv Detail & Related papers (2025-07-07T17:59:54Z) - From Single to Multi-Granularity: Toward Long-Term Memory Association and Selection of Conversational Agents [79.87304940020256]
Large Language Models (LLMs) have been widely adopted in conversational agents.<n>MemGAS is a framework that enhances memory consolidation by constructing multi-granularity association, adaptive selection, and retrieval.<n> Experiments on four long-term memory benchmarks demonstrate that MemGAS outperforms state-of-the-art methods on both question answer and retrieval tasks.
arXiv Detail & Related papers (2025-05-26T06:13:07Z) - SCM: Enhancing Large Language Model with Self-Controlled Memory Framework [54.33686574304374]
Large Language Models (LLMs) are constrained by their inability to process lengthy inputs, resulting in the loss of critical historical information.<n>We propose the Self-Controlled Memory (SCM) framework to enhance the ability of LLMs to maintain long-term memory and recall relevant information.
arXiv Detail & Related papers (2023-04-26T07:25:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.