MemSifter: Offloading LLM Memory Retrieval via Outcome-Driven Proxy Reasoning
- URL: http://arxiv.org/abs/2603.03379v1
- Date: Tue, 03 Mar 2026 02:57:38 GMT
- Title: MemSifter: Offloading LLM Memory Retrieval via Outcome-Driven Proxy Reasoning
- Authors: Jiejun Tan, Zhicheng Dou, Liancheng Zhang, Yuyang Hu, Yiruo Cheng, Ji-Rong Wen,
- Abstract summary: Large Language Models (LLMs) are increasingly used for long-duration tasks.<n>Current methods face a trade-off between cost and accuracy.<n>MemSifter is a novel framework that offloads the memory retrieval process to a small-scale proxy model.
- Score: 78.46301394559903
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As Large Language Models (LLMs) are increasingly used for long-duration tasks, maintaining effective long-term memory has become a critical challenge. Current methods often face a trade-off between cost and accuracy. Simple storage methods often fail to retrieve relevant information, while complex indexing methods (such as memory graphs) require heavy computation and can cause information loss. Furthermore, relying on the working LLM to process all memories is computationally expensive and slow. To address these limitations, we propose MemSifter, a novel framework that offloads the memory retrieval process to a small-scale proxy model. Instead of increasing the burden on the primary working LLM, MemSifter uses a smaller model to reason about the task before retrieving the necessary information. This approach requires no heavy computation during the indexing phase and adds minimal overhead during inference. To optimize the proxy model, we introduce a memory-specific Reinforcement Learning (RL) training paradigm. We design a task-outcome-oriented reward based on the working LLM's actual performance in completing the task. The reward measures the actual contribution of retrieved memories by mutiple interactions with the working LLM, and discriminates retrieved rankings by stepped decreasing contributions. Additionally, we employ training techniques such as Curriculum Learning and Model Merging to improve performance. We evaluated MemSifter on eight LLM memory benchmarks, including Deep Research tasks. The results demonstrate that our method meets or exceeds the performance of existing state-of-the-art approaches in both retrieval accuracy and final task completion. MemSifter offers an efficient and scalable solution for long-term LLM memory. We have open-sourced the model weights, code, and training data to support further research.
Related papers
- Agentic Memory: Learning Unified Long-Term and Short-Term Memory Management for Large Language Model Agents [57.38404718635204]
Large language model (LLM) agents face fundamental limitations in long-horizon reasoning due to finite context windows.<n>Existing methods typically handle long-term memory (LTM) and short-term memory (STM) as separate components.<n>We propose Agentic Memory (AgeMem), a unified framework that integrates LTM and STM management directly into the agent's policy.
arXiv Detail & Related papers (2026-01-05T08:24:16Z) - Reversing Large Language Models for Efficient Training and Fine-Tuning [24.232966507637673]
Large Language Models (LLMs) are known for their expensive and time-consuming training.<n>We introduce memory-efficient, reversible architectures for LLMs inspired by symmetric and symplectic differential equations.<n>Our results show comparable or improved performance on several datasets and benchmarks.
arXiv Detail & Related papers (2025-11-27T19:32:15Z) - Memento: Fine-tuning LLM Agents without Fine-tuning LLMs [36.3424780932712]
We introduce a novel learning paradigm for Adaptive Large Language Model (LLM) agents.<n>Our method enables low-cost continual adaptation via memory-based online reinforcement learning.<n>We instantiate our agent model in the deep research setting, namely emphMemento, which attains top-1 on GAIA validation.
arXiv Detail & Related papers (2025-08-22T07:25:30Z) - Learn to Memorize: Optimizing LLM-based Agents with Adaptive Memory Framework [33.739298910759544]
We propose to optimize LLM-based agents with an adaptive and data-driven memory framework by modeling memory cycles.<n>Specifically, we design an MoE gate function to facilitate memory retrieval, propose a learnable aggregation process to improve memory utilization, and develop task-specific reflection to adapt memory storage.
arXiv Detail & Related papers (2025-08-15T12:22:52Z) - Iterative Self-Incentivization Empowers Large Language Models as Agentic Searchers [74.17516978246152]
Large language models (LLMs) have been widely integrated into information retrieval to advance traditional techniques.<n>We propose EXSEARCH, an agentic search framework, where the LLM learns to retrieve useful information as the reasoning unfolds.<n>Experiments on four knowledge-intensive benchmarks show that EXSEARCH substantially outperforms baselines.
arXiv Detail & Related papers (2025-05-26T15:27:55Z) - Cost-Optimal Grouped-Query Attention for Long-Context Modeling [45.981681856747365]
Grouped-Query Attention (GQA) is a widely adopted strategy for reducing the computational cost of attention layers in large language models.<n>We analyze the relationship among context length, model size, GQA configuration, and model loss.<n>We propose a recipe for deriving cost-optimal GQA configurations.
arXiv Detail & Related papers (2025-03-12T17:50:42Z) - CMT: A Memory Compression Method for Continual Knowledge Learning of Large Language Models [22.93893181000535]
Large Language Models (LLMs) need to adapt to the continuous changes in data, tasks, and user preferences.<n>To address these challenges, this paper proposes the Compression Memory Training (CMT) method.<n>CMT compresses and extracts information from new documents to be stored in a memory bank.<n>When answering to queries related to these new documents, the model aggregates these document memories from the memory bank to better answer user questions.
arXiv Detail & Related papers (2024-12-10T10:35:19Z) - MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning [105.11844150736536]
Low-rank adaptation is a popular parameter-efficient fine-tuning method for large language models.
We propose a new method called MoRA, which employs a square matrix to achieve high-rank updating while maintaining the same number of trainable parameters.
Our method outperforms LoRA on memory-intensive tasks and achieves comparable performance on other tasks.
arXiv Detail & Related papers (2024-05-20T15:48:32Z) - MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory [49.96019697955383]
We introduce MemLLM, a novel method of enhancing large language models (LLMs) by integrating a structured and explicit read-and-write memory module.<n>Our experiments indicate that MemLLM enhances the LLM's performance and interpretability, in language modeling in general and knowledge-intensive tasks in particular.
arXiv Detail & Related papers (2024-04-17T18:13:16Z) - LLM-Pruner: On the Structural Pruning of Large Language Models [65.02607075556742]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation.
We tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset.
Our method, named LLM-Pruner, adopts structural pruning that selectively removes non-critical coupled structures.
arXiv Detail & Related papers (2023-05-19T12:10:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.