Related papers: MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning

MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning

URL: http://arxiv.org/abs/2601.21468v2
Date: Wed, 04 Feb 2026 07:03:12 GMT
Title: MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning
Authors: Yaorui Shi, Shugui Liu, Yu Yang, Wenyu Mao, Yuxin Chen, Qi GU, Hui Su, Xunliang Cai, Xiang Wang, An Zhang,
Abstract summary: We introduce MemOCR, a multimodal memory agent that improves long-horizon reasoning under tight context budgets.<n>MemOCR allocates memory space with adaptive information density through visual layout.<n>We train MemOCR with reinforcement learning under budget-aware objectives that expose the agent to diverse compression levels.
Score: 36.52465672754168
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Long-horizon agentic reasoning necessitates effectively compressing growing interaction histories into a limited context window. Most existing memory systems serialize history as text, where token-level cost is uniform and scales linearly with length, often spending scarce budget on low-value details. To this end, we introduce MemOCR, a multimodal memory agent that improves long-horizon reasoning under tight context budgets by allocating memory space with adaptive information density through visual layout. Concretely, MemOCR maintains a structured rich-text memory (e.g., headings, highlights) and renders it into an image that the agent consults for memory access, visually prioritizing crucial evidence while aggressively compressing auxiliary details. To ensure robustness across varying memory budgets, we train MemOCR with reinforcement learning under budget-aware objectives that expose the agent to diverse compression levels. Across long-context multi-hop and single-hop question-answering benchmarks, MemOCR outperforms strong text-based baselines and achieves more effective context utilization under extreme budgets.

Related papers

Beyond the Context Window: A Cost-Performance Analysis of Fact-Based Memory vs. Long-Context LLMs for Persistent Agents [0.0]
Persistent AI systems face a choice between passing full conversation histories to a long-context large language model (LLM) and maintaining a dedicated memory system that extracts and retrieves structured facts.<n>We compare a fact-based memory system built on the Mem0 framework against long-context LLM inference on three memory-centric benchmarks.
arXiv Detail & Related papers (2026-03-05T05:01:30Z)
Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory [56.0946692457838]
BudgetMem is a runtime agent memory framework for explicit, query-aware performance-cost control.<n>A lightweight router performs budget-tier routing across modules to balance task performance and memory construction cost.<n>Across LoCoMo, LongMemEval, and HotpotQA, BudgetMem surpasses strong baselines when performance is prioritized.
arXiv Detail & Related papers (2026-02-05T18:57:09Z)
AMA: Adaptive Memory via Multi-Agent Collaboration [54.490349689939166]
We propose Adaptive Memory via Multi-Agent Collaboration (AMA), a novel framework that leverages coordinated agents to manage memory across multiple granularities.<n>AMA significantly outperforms state-of-the-art baselines while reducing token consumption by approximately 80% compared to full-context methods.
arXiv Detail & Related papers (2026-01-28T08:09:49Z)
Fine-Mem: Fine-Grained Feedback Alignment for Long-Horizon Memory Management [63.48041801851891]
Fine-Mem is a unified framework designed for fine-grained feedback alignment.<n> Experiments on Memalpha and MemoryAgentBench demonstrate that Fine-Mem consistently outperforms strong baselines.
arXiv Detail & Related papers (2026-01-13T11:06:17Z)
Agentic Memory: Learning Unified Long-Term and Short-Term Memory Management for Large Language Model Agents [57.38404718635204]
Large language model (LLM) agents face fundamental limitations in long-horizon reasoning due to finite context windows.<n>Existing methods typically handle long-term memory (LTM) and short-term memory (STM) as separate components.<n>We propose Agentic Memory (AgeMem), a unified framework that integrates LTM and STM management directly into the agent's policy.
arXiv Detail & Related papers (2026-01-05T08:24:16Z)
TeleMem: Building Long-Term and Multimodal Memory for Agentic AI [43.36544433800511]
Large language models (LLMs) excel at many NLP tasks but struggle to sustain long-term interactions due to limited attention over extended dialogue histories.<n>We propose TeleMem, a unified long-term and multimodal memory system that maintains coherent user profiles through narrative dynamic extraction.<n>TeleMem surpasses the state-of-the-art Mem0 baseline with 19% higher accuracy, 43% fewer tokens, and a 2.1x speedup on the ZH-4O long-term role-play gaming benchmark.
arXiv Detail & Related papers (2025-12-12T11:24:52Z)
BudgetMem: Learning Selective Memory Policies for Cost-Efficient Long-Context Processing in Language Models [0.0]
BudgetMem is a novel memory augmented architecture that learns what to remember rather than remembering everything.<n>Our system combines selective memory policies with feature based salience scoring to decide which information merits storage under strict budget constraints.<n>Our work provides a practical pathway for deploying capable long context systems on modest hardware, democratizing access to advanced language understanding capabilities.
arXiv Detail & Related papers (2025-11-07T01:49:22Z)
SGMem: Sentence Graph Memory for Long-Term Conversational Agents [14.89396085814917]
We introduce SGMem (Sentence Graph Memory), which represents dialogue as sentence-level graphs within chunked units.<n>We show that SGMem consistently improves accuracy and outperforms strong baselines in long-term conversational question answering.
arXiv Detail & Related papers (2025-09-25T14:21:44Z)
SCM: Enhancing Large Language Model with Self-Controlled Memory Framework [54.33686574304374]
Large Language Models (LLMs) are constrained by their inability to process lengthy inputs, resulting in the loss of critical historical information.<n>We propose the Self-Controlled Memory (SCM) framework to enhance the ability of LLMs to maintain long-term memory and recall relevant information.
arXiv Detail & Related papers (2023-04-26T07:25:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.