Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory
- URL: http://arxiv.org/abs/2602.06025v1
- Date: Thu, 05 Feb 2026 18:57:09 GMT
- Title: Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory
- Authors: Haozhen Zhang, Haodong Yue, Tao Feng, Quanyu Long, Jianzhu Bao, Bowen Jin, Weizhi Zhang, Xiao Li, Jiaxuan You, Chengwei Qin, Wenya Wang,
- Abstract summary: BudgetMem is a runtime agent memory framework for explicit, query-aware performance-cost control.<n>A lightweight router performs budget-tier routing across modules to balance task performance and memory construction cost.<n>Across LoCoMo, LongMemEval, and HotpotQA, BudgetMem surpasses strong baselines when performance is prioritized.
- Score: 56.0946692457838
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Memory is increasingly central to Large Language Model (LLM) agents operating beyond a single context window, yet most existing systems rely on offline, query-agnostic memory construction that can be inefficient and may discard query-critical information. Although runtime memory utilization is a natural alternative, prior work often incurs substantial overhead and offers limited explicit control over the performance-cost trade-off. In this work, we present \textbf{BudgetMem}, a runtime agent memory framework for explicit, query-aware performance-cost control. BudgetMem structures memory processing as a set of memory modules, each offered in three budget tiers (i.e., \textsc{Low}/\textsc{Mid}/\textsc{High}). A lightweight router performs budget-tier routing across modules to balance task performance and memory construction cost, which is implemented as a compact neural policy trained with reinforcement learning. Using BudgetMem as a unified testbed, we study three complementary strategies for realizing budget tiers: implementation (method complexity), reasoning (inference behavior), and capacity (module model size). Across LoCoMo, LongMemEval, and HotpotQA, BudgetMem surpasses strong baselines when performance is prioritized (i.e., high-budget setting), and delivers better accuracy-cost frontiers under tighter budgets. Moreover, our analysis disentangles the strengths and weaknesses of different tiering strategies, clarifying when each axis delivers the most favorable trade-offs under varying budget regimes.
Related papers
- Budget-Aware Agentic Routing via Boundary-Guided Training [24.0709108941881]
Budget-Aware Agentic Routing selects between a cheap and an expensive model at each step to optimize the cost-success frontier.<n> Boundary-Guided Training builds a difficulty taxonomy to anchor learning under sparse rewards.<n>Experiment results show that our method improves the efficiency frontier, matching strong routing baselines at substantially lower cost.
arXiv Detail & Related papers (2026-02-04T07:39:27Z) - MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning [36.52465672754168]
We introduce MemOCR, a multimodal memory agent that improves long-horizon reasoning under tight context budgets.<n>MemOCR allocates memory space with adaptive information density through visual layout.<n>We train MemOCR with reinforcement learning under budget-aware objectives that expose the agent to diverse compression levels.
arXiv Detail & Related papers (2026-01-29T09:47:17Z) - Agentic Memory: Learning Unified Long-Term and Short-Term Memory Management for Large Language Model Agents [57.38404718635204]
Large language model (LLM) agents face fundamental limitations in long-horizon reasoning due to finite context windows.<n>Existing methods typically handle long-term memory (LTM) and short-term memory (STM) as separate components.<n>We propose Agentic Memory (AgeMem), a unified framework that integrates LTM and STM management directly into the agent's policy.
arXiv Detail & Related papers (2026-01-05T08:24:16Z) - AgentBalance: Backbone-then-Topology Design for Cost-Effective Multi-Agent Systems under Budget Constraints [7.38359558170225]
Large Language Model (LLM)-based multi-agent systems (MAS) are becoming indispensable building blocks for web-scale applications.<n>We present AgentBalance, a framework for constructing cost-effective MAS under explicit token-cost and latency budgets.
arXiv Detail & Related papers (2025-12-12T10:08:03Z) - Remember Me, Refine Me: A Dynamic Procedural Memory Framework for Experience-Driven Agent Evolution [52.76038908826961]
We propose $textbfReMe$ ($textitRemember Me, Refine Me$) to bridge the gap between static storage and dynamic reasoning.<n>ReMe innovates across the memory lifecycle via three mechanisms: $textitmulti-faceted distillation$, which extracts fine-grained experiences.<n>Experiments on BFCL-V3 and AppWorld demonstrate that ReMe establishes a new state-of-the-art in agent memory system.
arXiv Detail & Related papers (2025-12-11T14:40:01Z) - Budget-Aware Tool-Use Enables Effective Agent Scaling [82.6942342482552]
Scaling test-time computation improves performance across different tasks on large language models (LLMs)<n>We study how to scale such agents effectively under explicit tool-call budgets, focusing on web search agents.<n>We introduce the Budget Tracker, a lightweight plug-in that provides the agent with continuous budget awareness.
arXiv Detail & Related papers (2025-11-21T07:18:55Z) - BudgetMem: Learning Selective Memory Policies for Cost-Efficient Long-Context Processing in Language Models [0.0]
BudgetMem is a novel memory augmented architecture that learns what to remember rather than remembering everything.<n>Our system combines selective memory policies with feature based salience scoring to decide which information merits storage under strict budget constraints.<n>Our work provides a practical pathway for deploying capable long context systems on modest hardware, democratizing access to advanced language understanding capabilities.
arXiv Detail & Related papers (2025-11-07T01:49:22Z) - AILoRA: Function-Aware Asymmetric Initialization for Low-Rank Adaptation of Large Language Models [11.663809872664105]
Low-Rank Adaptation (LoRA) has emerged as one of the most widely adopted approaches.<n>LoRA is typically applied to the $WQ$ and $WV$ projection matrices of self-attention modules.<n>We introduce textAILoRA, a novel parameter-efficient method that incorporates function-aware asymmetric low-rank priors.
arXiv Detail & Related papers (2025-10-09T10:13:16Z) - Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning [89.55738101744657]
Large Language Models (LLMs) have demonstrated impressive capabilities across a wide range of NLP tasks, but they remain fundamentally stateless.<n>We present Memory-R1, a reinforcement learning framework that equips LLMs with the ability to actively manage and utilize external memory.
arXiv Detail & Related papers (2025-08-27T12:26:55Z) - BudgetThinker: Empowering Budget-aware LLM Reasoning with Control Tokens [33.607723102172194]
BudgetThinker is a framework designed to empower Large Language Models with budget-aware reasoning.<n>We show that BudgetThinker significantly surpasses strong baselines in maintaining performance across a variety of reasoning budgets.
arXiv Detail & Related papers (2025-08-24T03:17:50Z) - RCR-Router: Efficient Role-Aware Context Routing for Multi-Agent LLM Systems with Structured Memory [57.449129198822476]
RCR is a role-aware context routing framework for multi-agent large language model (LLM) systems.<n>It dynamically selects semantically relevant memory subsets for each agent based on its role and task stage.<n>A lightweight scoring policy guides memory selection, and agent outputs are integrated into a shared memory store.
arXiv Detail & Related papers (2025-08-06T21:59:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.