MemPO: Self-Memory Policy Optimization for Long-Horizon Agents
- URL: http://arxiv.org/abs/2603.00680v1
- Date: Sat, 28 Feb 2026 14:43:02 GMT
- Title: MemPO: Self-Memory Policy Optimization for Long-Horizon Agents
- Authors: Ruoran Li, Xinghua Zhang, Haiyang Yu, Shitong Duan, Xiang Li, Wenxin Xiang, Chonghua Liao, Xudong Guo, Yongbin Li, Jinli Suo,
- Abstract summary: Existing methods typically introduce the external memory module and look up the relevant information from the stored memory.<n>We propose the self-memory policy optimization algorithm (MemPO), which enables the agent to autonomously summarize and manage their memory.<n>MemPO achieves absolute F1 score gains of 25.98% over the base model and 7.1% over the previous SOTA baseline, while reducing token usage by 67.58% and 73.12%.
- Score: 52.00646524941419
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Long-horizon agents face the challenge of growing context size during interaction with environment, which degrades the performance and stability. Existing methods typically introduce the external memory module and look up the relevant information from the stored memory, which prevents the model itself from proactively managing its memory content and aligning with the agent's overarching task objectives. To address these limitations, we propose the self-memory policy optimization algorithm (MemPO), which enables the agent (policy model) to autonomously summarize and manage their memory during interaction with environment. By improving the credit assignment mechanism based on memory effectiveness, the policy model can selectively retain crucial information, significantly reducing token consumption while preserving task performance. Extensive experiments and analyses confirm that MemPO achieves absolute F1 score gains of 25.98% over the base model and 7.1% over the previous SOTA baseline, while reducing token usage by 67.58% and 73.12%.
Related papers
- AMemGym: Interactive Memory Benchmarking for Assistants in Long-Horizon Conversations [61.6579785305668]
AMemGym is an interactive environment enabling on-policy evaluation and optimization for memory-driven personalization.<n>Our framework provides a scalable, diagnostically rich environment for advancing memory capabilities in conversational agents.
arXiv Detail & Related papers (2026-03-02T15:15:11Z) - UMEM: Unified Memory Extraction and Management Framework for Generalizable Memory [46.87954895079213]
Self-evolving memory serves as the trainable parameters for Large Language Models (LLMs)<n>Existing methods predominately optimize memory management while treating memory extraction as a static process.<n>We propose Unified Memory Extraction and Management (UMEM) to jointly optimize a Large Language Model to simultaneous extract and manage memories.
arXiv Detail & Related papers (2026-02-11T08:58:41Z) - Mem-T: Densifying Rewards for Long-Horizon Memory Agents [23.19373149519922]
We introduce Mem-T, an autonomous memory agent that interfaces with a lightweight hierarchical memory database to perform dynamic updates and multi-turn retrieval over streaming inputs.<n>We also propose MoT-GRPO, a tree-guided reinforcement learning framework that transforms sparse terminal feedback into dense, step-wise supervision via memory operation tree backpropagation and hindsight credit assignment.
arXiv Detail & Related papers (2026-01-30T14:23:33Z) - MemoryRewardBench: Benchmarking Reward Models for Long-Term Memory Management in Large Language Models [40.965722377085456]
We introduce MemoryRewardBench, the first benchmark to systematically study the ability of reward models to evaluate memory quality.<n> Evaluations on 13 cutting-edge RMs indicate a diminishing performance gap between open-source and proprietary models.
arXiv Detail & Related papers (2026-01-17T09:04:53Z) - Agentic Memory: Learning Unified Long-Term and Short-Term Memory Management for Large Language Model Agents [57.38404718635204]
Large language model (LLM) agents face fundamental limitations in long-horizon reasoning due to finite context windows.<n>Existing methods typically handle long-term memory (LTM) and short-term memory (STM) as separate components.<n>We propose Agentic Memory (AgeMem), a unified framework that integrates LTM and STM management directly into the agent's policy.
arXiv Detail & Related papers (2026-01-05T08:24:16Z) - Beyond Heuristics: A Decision-Theoretic Framework for Agent Memory Management [49.71055327567513]
We argue that memory management should be viewed as a sequential decision-making problem under uncertainty.<n>Our contribution is not a new algorithm, but a principled reframing that clarifies the limitations of approaches.
arXiv Detail & Related papers (2025-12-25T08:23:03Z) - Forgetful but Faithful: A Cognitive Memory Architecture and Benchmark for Privacy-Aware Generative Agents [2.28438857884398]
This paper introduces the Memory-Aware Retention (MaRS), a novel framework for human-centered memory management in generative agents.<n>We present the Forgetful but Faithful Agent (FiFA) benchmark, a comprehensive evaluation framework that assesses agent performance across narrative coherence, goal completion, social recall accuracy, privacy preservation, and cost efficiency.<n>Our work establishes new benchmarks for memory-budgeted agent evaluation and provides practical guidelines for deploying generative agents in resource-constrained, privacy-sensitive environments.
arXiv Detail & Related papers (2025-12-14T21:40:07Z) - O-Mem: Omni Memory System for Personalized, Long Horizon, Self-Evolving Agents [60.1848551962911]
O-Mem is a novel memory framework based on active user profiling.<n>O-Mem supports hierarchical retrieval of persona attributes and topic-related context.
arXiv Detail & Related papers (2025-11-17T16:55:19Z) - Analysis of the Memorization and Generalization Capabilities of AI
Agents: Are Continual Learners Robust? [91.682459306359]
In continual learning (CL), an AI agent learns from non-stationary data streams under dynamic environments.
In this paper, a novel CL framework is proposed to achieve robust generalization to dynamic environments while retaining past knowledge.
The generalization and memorization performance of the proposed framework are theoretically analyzed.
arXiv Detail & Related papers (2023-09-18T21:00:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.