Related papers: BudgetMem: Learning Selective Memory Policies for Cost-Efficient Long-Context Processing in Language Models

BudgetMem: Learning Selective Memory Policies for Cost-Efficient Long-Context Processing in Language Models

URL: http://arxiv.org/abs/2511.04919v1
Date: Fri, 07 Nov 2025 01:49:22 GMT
Title: BudgetMem: Learning Selective Memory Policies for Cost-Efficient Long-Context Processing in Language Models
Authors: Chandra Vamsi Krishna Alla, Harish Naidu Gaddam, Manohar Kommi,
Abstract summary: BudgetMem is a novel memory augmented architecture that learns what to remember rather than remembering everything.<n>Our system combines selective memory policies with feature based salience scoring to decide which information merits storage under strict budget constraints.<n>Our work provides a practical pathway for deploying capable long context systems on modest hardware, democratizing access to advanced language understanding capabilities.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) face significant computational and memory constraints when processing long contexts, despite growing demand for applications requiring reasoning over extensive documents, multi-session dialogues, and book length texts. While recent advances have extended context windows to 100K-1M tokens, such approaches incur prohibitive costs for resource constrained deployments. We propose BudgetMem, a novel memory augmented architecture that learns what to remember rather than remembering everything. Our system combines selective memory policies with feature based salience scoring (entity density, TF-IDF, discourse markers, position bias) to decide which information merits storage under strict budget constraints. Unlike existing retrieval augmented generation (RAG) systems that store all chunks, BudgetMem employs learned gating mechanisms coupled with BM25 sparse retrieval for efficient information access. Through comprehensive experiments on 700 question answer pairs across short (237 tokens) and long (5K-10K tokens) documents with Llama-3.2-3B-Instruct, we demonstrate that BudgetMem achieves remarkable results on long documents: only 1.0% F1 score degradation while saving 72.4% memory compared to baseline RAG. We validate our approach through budget sensitivity analysis (testing 7 budget ratios), naive baseline comparisons, and document length analysis, showing that BudgetMem's benefits increase with document length. Our work provides a practical pathway for deploying capable long context systems on modest hardware, democratizing access to advanced language understanding capabilities.

Related papers

Beyond the Context Window: A Cost-Performance Analysis of Fact-Based Memory vs. Long-Context LLMs for Persistent Agents [0.0]
Persistent AI systems face a choice between passing full conversation histories to a long-context large language model (LLM) and maintaining a dedicated memory system that extracts and retrieves structured facts.<n>We compare a fact-based memory system built on the Mem0 framework against long-context LLM inference on three memory-centric benchmarks.
arXiv Detail & Related papers (2026-03-05T05:01:30Z)
Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory [56.0946692457838]
BudgetMem is a runtime agent memory framework for explicit, query-aware performance-cost control.<n>A lightweight router performs budget-tier routing across modules to balance task performance and memory construction cost.<n>Across LoCoMo, LongMemEval, and HotpotQA, BudgetMem surpasses strong baselines when performance is prioritized.
arXiv Detail & Related papers (2026-02-05T18:57:09Z)
MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning [36.52465672754168]
We introduce MemOCR, a multimodal memory agent that improves long-horizon reasoning under tight context budgets.<n>MemOCR allocates memory space with adaptive information density through visual layout.<n>We train MemOCR with reinforcement learning under budget-aware objectives that expose the agent to diverse compression levels.
arXiv Detail & Related papers (2026-01-29T09:47:17Z)
Mem-Gallery: Benchmarking Multimodal Long-Term Conversational Memory for MLLM Agents [76.76004970226485]
Long-term memory is a critical capability for multimodal large language model (MLLM) agents.<n>Mem-Gallery is a new benchmark for evaluating multimodal long-term conversational memory in MLLM agents.
arXiv Detail & Related papers (2026-01-07T02:03:13Z)
Beyond a Million Tokens: Benchmarking and Enhancing Long-Term Memory in LLMs [28.807582003957005]
We present a framework for evaluating the abilities of large language models (LLMs) for tasks that require long-term memory and thus long-context reasoning.<n>We then construct BEAM, a new benchmark comprising 100 conversations and 2,000 validated questions.<n>To enhance model performance, we propose LIGHT-a framework inspired by human cognition that equips LLMs with three complementary memory systems.
arXiv Detail & Related papers (2025-10-31T07:29:52Z)
Evaluating Long-Term Memory for Long-Context Question Answering [100.1267054069757]
We present a systematic evaluation of memory-augmented methods using LoCoMo, a benchmark of synthetic long-context dialogues annotated for question-answering tasks.<n>Our findings show that memory-augmented approaches reduce token usage by over 90% while maintaining competitive accuracy.
arXiv Detail & Related papers (2025-10-27T18:03:50Z)
SGMem: Sentence Graph Memory for Long-Term Conversational Agents [14.89396085814917]
We introduce SGMem (Sentence Graph Memory), which represents dialogue as sentence-level graphs within chunked units.<n>We show that SGMem consistently improves accuracy and outperforms strong baselines in long-term conversational question answering.
arXiv Detail & Related papers (2025-09-25T14:21:44Z)
MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents [84.62985963113245]
We introduce MEM1, an end-to-end reinforcement learning framework that enables agents to operate with constant memory across long multi-turn tasks.<n>At each turn, MEM1 updates a compact shared internal state that jointly supports memory consolidation and reasoning.<n>We show that MEM1-7B improves performance by 3.5x while reducing memory usage by 3.7x compared to Qwen2.5-14B-Instruct on a 16-objective multi-hop QA task.
arXiv Detail & Related papers (2025-06-18T19:44:46Z)
From Single to Multi-Granularity: Toward Long-Term Memory Association and Selection of Conversational Agents [79.87304940020256]
Large Language Models (LLMs) have been widely adopted in conversational agents.<n>MemGAS is a framework that enhances memory consolidation by constructing multi-granularity association, adaptive selection, and retrieval.<n> Experiments on four long-term memory benchmarks demonstrate that MemGAS outperforms state-of-the-art methods on both question answer and retrieval tasks.
arXiv Detail & Related papers (2025-05-26T06:13:07Z)
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory [0.5584627289325719]
Large Language Models (LLMs) have demonstrated remarkable prowess in generating contextually coherent responses.<n>But their fixed context windows pose fundamental challenges for maintaining consistency over prolonged multi-session dialogues.<n>We introduce Mem0, a scalable memory-centric architecture that addresses this issue by dynamically extracting, consolidating, and retrieving salient information from ongoing conversations.
arXiv Detail & Related papers (2025-04-28T01:46:35Z)
XL$^2$Bench: A Benchmark for Extremely Long Context Understanding with Long-range Dependencies [45.31042312867939]
Large Language Models (LLMs) have demonstrated remarkable performance across diverse tasks but are constrained by their small context window sizes. Various efforts have been proposed to expand the context window to accommodate even up to 200K input tokens. We introduce a benchmark for extremely long context understanding with long-range dependencies, XL$2$Bench.
arXiv Detail & Related papers (2024-04-08T12:29:07Z)
SCM: Enhancing Large Language Model with Self-Controlled Memory Framework [54.33686574304374]
Large Language Models (LLMs) are constrained by their inability to process lengthy inputs, resulting in the loss of critical historical information.<n>We propose the Self-Controlled Memory (SCM) framework to enhance the ability of LLMs to maintain long-term memory and recall relevant information.
arXiv Detail & Related papers (2023-04-26T07:25:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.