ELMUR: External Layer Memory with Update/Rewrite for Long-Horizon RL
- URL: http://arxiv.org/abs/2510.07151v1
- Date: Wed, 08 Oct 2025 15:50:34 GMT
- Title: ELMUR: External Layer Memory with Update/Rewrite for Long-Horizon RL
- Authors: Egor Cherepanov, Alexey K. Kovalev, Aleksandr I. Panov,
- Abstract summary: We propose ELMUR, a transformer architecture with structured external memory.<n>ELMUR extends effective horizons up to 100,000 times beyond the attention window.<n>It achieves a 100% success rate on a synthetic T-Maze task with corridors up to one million steps.
- Score: 48.214881182054164
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Real-world robotic agents must act under partial observability and long horizons, where key cues may appear long before they affect decision making. However, most modern approaches rely solely on instantaneous information, without incorporating insights from the past. Standard recurrent or transformer models struggle with retaining and leveraging long-term dependencies: context windows truncate history, while naive memory extensions fail under scale and sparsity. We propose ELMUR (External Layer Memory with Update/Rewrite), a transformer architecture with structured external memory. Each layer maintains memory embeddings, interacts with them via bidirectional cross-attention, and updates them through an Least Recently Used (LRU) memory module using replacement or convex blending. ELMUR extends effective horizons up to 100,000 times beyond the attention window and achieves a 100% success rate on a synthetic T-Maze task with corridors up to one million steps. In POPGym, it outperforms baselines on more than half of the tasks. On MIKASA-Robo sparse-reward manipulation tasks with visual observations, it nearly doubles the performance of strong baselines. These results demonstrate that structured, layer-local external memory offers a simple and scalable approach to decision making under partial observability.
Related papers
- RoboMME: Benchmarking and Understanding Memory for Robotic Generalist Policies [54.23445842621374]
Memory is critical for long-horizon and history-dependent robotic manipulation.<n>Recent vision-language-action (VLA) models have begun to incorporate memory mechanisms.<n>We introduce RoboMME: a large-scale standardized benchmark for evaluating and advancing VLA models.
arXiv Detail & Related papers (2026-03-04T21:59:32Z) - Agentic Memory: Learning Unified Long-Term and Short-Term Memory Management for Large Language Model Agents [57.38404718635204]
Large language model (LLM) agents face fundamental limitations in long-horizon reasoning due to finite context windows.<n>Existing methods typically handle long-term memory (LTM) and short-term memory (STM) as separate components.<n>We propose Agentic Memory (AgeMem), a unified framework that integrates LTM and STM management directly into the agent's policy.
arXiv Detail & Related papers (2026-01-05T08:24:16Z) - Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning [89.55738101744657]
Large Language Models (LLMs) have demonstrated impressive capabilities across a wide range of NLP tasks, but they remain fundamentally stateless.<n>We present Memory-R1, a reinforcement learning framework that equips LLMs with the ability to actively manage and utilize external memory.
arXiv Detail & Related papers (2025-08-27T12:26:55Z) - MemoryVLA: Perceptual-Cognitive Memory in Vision-Language-Action Models for Robotic Manipulation [59.31354761628506]
Temporal context is essential for robotic manipulation because such tasks are inherently non-Markovian, yet mainstream VLA models typically overlook it.<n>We propose MemoryVLA, a Cognition-Memory-Action framework for long-horizon robotic manipulation.<n>We evaluate it on 150+ simulation and real-world tasks across three robots.
arXiv Detail & Related papers (2025-08-26T17:57:16Z) - MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents [84.62985963113245]
We introduce MEM1, an end-to-end reinforcement learning framework that enables agents to operate with constant memory across long multi-turn tasks.<n>At each turn, MEM1 updates a compact shared internal state that jointly supports memory consolidation and reasoning.<n>We show that MEM1-7B improves performance by 3.5x while reducing memory usage by 3.7x compared to Qwen2.5-14B-Instruct on a 16-objective multi-hop QA task.
arXiv Detail & Related papers (2025-06-18T19:44:46Z) - ATLAS: Learning to Optimally Memorize the Context at Test Time [31.41718170413687]
ATLAS is a long-term memory module with high capacity that learns to memorize the context.<n>We present a new family of Transformer-like architectures, called DeepTransformers, that are strict generalizations of the original Transformer architecture.
arXiv Detail & Related papers (2025-05-29T17:57:16Z) - R$^3$Mem: Bridging Memory Retention and Retrieval via Reversible Compression [24.825945729508682]
We propose R$3$Mem, a memory network that optimize both information Retention and Retrieval.<n>R$3$Mem employs virtual memory tokens to compress and encode infinitely long histories, further enhanced by a hierarchical compression strategy.<n>Experiments demonstrate that our memory design achieves state-of-the-art performance in long-context language modeling and retrieval-augmented generation tasks.
arXiv Detail & Related papers (2025-02-21T21:39:00Z) - HMT: Hierarchical Memory Transformer for Efficient Long Context Language Processing [33.720656946186885]
Hierarchical Memory Transformer (HMT) is a novel framework that facilitates a model's long-context processing ability.<n>HMT consistently improves the long-context processing ability of existing models.
arXiv Detail & Related papers (2024-05-09T19:32:49Z) - LaMemo: Language Modeling with Look-Ahead Memory [50.6248714811912]
We propose Look-Ahead Memory (LaMemo) that enhances the recurrence memory by incrementally attending to the right-side tokens.
LaMemo embraces bi-directional attention and segment recurrence with an additional overhead only linearly proportional to the memory length.
Experiments on widely used language modeling benchmarks demonstrate its superiority over the baselines equipped with different types of memory.
arXiv Detail & Related papers (2022-04-15T06:11:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.