When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning
- URL: http://arxiv.org/abs/2602.10560v1
- Date: Wed, 11 Feb 2026 06:14:53 GMT
- Title: When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning
- Authors: Leheng Sheng, Yongtao Zhang, Wenchang Ma, Yaorui Shi, Ting Huang, Xiang Wang, An Zhang, Ke Shen, Tat-Seng Chua,
- Abstract summary: We propose GRU-Mem, which incorporates two text-controlled gates for more stable and efficient long-context reasoning.<n>GRU-Mem generally outperforms the vanilla MemAgent with up to 400% times inference speed acceleration.
- Score: 50.486479460454866
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While reasoning over long context is crucial for various real-world applications, it remains challenging for large language models (LLMs) as they suffer from performance degradation as the context length grows. Recent work MemAgent has tried to tackle this by processing context chunk-by-chunk in an RNN-like loop and updating a textual memory for final answering. However, this naive recurrent memory update faces two crucial drawbacks: (i) memory can quickly explode because it can update indiscriminately, even on evidence-free chunks; and (ii) the loop lacks an exit mechanism, leading to unnecessary computation after even sufficient evidence is collected. To address these issues, we propose GRU-Mem, which incorporates two text-controlled gates for more stable and efficient long-context reasoning. Specifically, in GRU-Mem, the memory only updates when the update gate is open and the recurrent loop will exit immediately once the exit gate is open. To endow the model with such capabilities, we introduce two reward signals $r^{\text{update}}$ and $r^{\text{exit}}$ within end-to-end RL, rewarding the correct updating and exiting behaviors respectively. Experiments on various long-context reasoning tasks demonstrate the effectiveness and efficiency of GRU-Mem, which generally outperforms the vanilla MemAgent with up to 400\% times inference speed acceleration.
Related papers
- Memory Caching: RNNs with Growing Memory [56.25483647131372]
We introduce Memory Caching (MC), a technique that enhances recurrent models by caching checkpoints of memory states (a.k.a. hidden states)<n>We propose four variants of MC, including gated aggregation and sparse selective mechanisms, and discuss their implications on both linear and deep memory modules.<n>The results indicate that while Transformers achieve the best accuracy, our MC variants show competitive performance, close the gap with Transformers, and performs better than state-of-the-art recurrent models.
arXiv Detail & Related papers (2026-02-27T18:53:41Z) - CoMeT: Collaborative Memory Transformer for Efficient Long Context Modeling [40.705016911274]
We introduce a novel architecture that enables LLMs to handle arbitrarily long sequences with constant memory usage and linear time complexity.<n>CoMeT can be integrated into pre-trained models with only minimal fine-tuning.<n>A model equipped with CoMeT and fine-tuned on 32k contexts can accurately retrieve a passkey from any position within a 1M token sequence.
arXiv Detail & Related papers (2026-02-02T07:49:44Z) - CogMem: A Cognitive Memory Architecture for Sustained Multi-Turn Reasoning in Large Language Models [21.427373172124167]
Large language models (LLMs) excel at single-turn reasoning but often lose accuracy and coherence over extended, multi-turn interactions.<n>We introduce CogMem, a memory-augmented LLM architecture that supports sustained iterative reasoning through structured, persistent memory.<n> Experiments on TurnBench show that this layered design mitigates reasoning failures, controls context growth, and improves consistency across extended reasoning chains.
arXiv Detail & Related papers (2025-12-16T06:01:08Z) - MemSearcher: Training LLMs to Reason, Search and Manage Memory via End-to-End Reinforcement Learning [73.27233666920618]
We propose MemSearcher, an agent workflow that iteratively maintains a compact memory and combines the current turn with it.<n>At each turn, MemSearcher fuses the user's question with the memory to generate reasoning traces, perform search actions, and update memory to retain only information essential for solving the task.<n>We introduce multi-context GRPO, an end-to-end RL framework that jointly optimize reasoning, search strategies, and memory management of MemSearcher Agents.
arXiv Detail & Related papers (2025-11-04T18:27:39Z) - ELMUR: External Layer Memory with Update/Rewrite for Long-Horizon RL [48.214881182054164]
We propose ELMUR, a transformer architecture with structured external memory.<n>ELMUR extends effective horizons up to 100,000 times beyond the attention window.<n>It achieves a 100% success rate on a synthetic T-Maze task with corridors up to one million steps.
arXiv Detail & Related papers (2025-10-08T15:50:34Z) - Compress, Gather, and Recompute: REFORMing Long-Context Processing in Transformers [58.98923344096319]
REFORM is a novel inference framework that efficiently handles long contexts through a two-phase approach.<n>It achieves over 50% and 27% performance gains on RULER and BABILong respectively at 1M context length.<n>It also outperforms baselines on Infinite-Bench and MM-NIAH, demonstrating flexibility across diverse tasks and domains.
arXiv Detail & Related papers (2025-06-01T23:49:14Z) - RelayAttention for Efficient Large Language Model Serving with Long System Prompts [59.50256661158862]
This paper aims to improve the efficiency of LLM services that involve long system prompts.
handling these system prompts requires heavily redundant memory accesses in existing causal attention algorithms.
We propose RelayAttention, an attention algorithm that allows reading hidden states from DRAM exactly once for a batch of input tokens.
arXiv Detail & Related papers (2024-02-22T18:58:28Z) - Memory Gym: Towards Endless Tasks to Benchmark Memory Capabilities of Agents [4.371266695484245]
Memory Gym presents a suite of 2D partially observable environments, namely Mortar Mayhem, Mystery Path, and Searing Spotlights.<n>These environments, originally with finite tasks, are expanded into innovative, endless formats, mirroring the escalating challenges of cumulative memory games such as "I packed my bag"
arXiv Detail & Related papers (2023-09-29T12:59:28Z) - Blockwise Parallel Transformer for Large Context Models [70.97386897478238]
Blockwise Parallel Transformer (BPT) is a blockwise computation of self-attention and feedforward network fusion to minimize memory costs.
By processing longer input sequences while maintaining memory efficiency, BPT enables training sequences 32 times longer than vanilla Transformers and up to 4 times longer than previous memory-efficient methods.
arXiv Detail & Related papers (2023-05-30T19:25:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.