Related papers: Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory

Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory

URL: http://arxiv.org/abs/2602.06025v1
Date: Thu, 05 Feb 2026 18:57:09 GMT
Title: Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory
Authors: Haozhen Zhang, Haodong Yue, Tao Feng, Quanyu Long, Jianzhu Bao, Bowen Jin, Weizhi Zhang, Xiao Li, Jiaxuan You, Chengwei Qin, Wenya Wang,
Abstract summary: BudgetMem is a runtime agent memory framework for explicit, query-aware performance-cost control.<n>A lightweight router performs budget-tier routing across modules to balance task performance and memory construction cost.<n>Across LoCoMo, LongMemEval, and HotpotQA, BudgetMem surpasses strong baselines when performance is prioritized.
Score: 56.0946692457838
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Memory is increasingly central to Large Language Model (LLM) agents operating beyond a single context window, yet most existing systems rely on offline, query-agnostic memory construction that can be inefficient and may discard query-critical information. Although runtime memory utilization is a natural alternative, prior work often incurs substantial overhead and offers limited explicit control over the performance-cost trade-off. In this work, we present \textbf{BudgetMem}, a runtime agent memory framework for explicit, query-aware performance-cost control. BudgetMem structures memory processing as a set of memory modules, each offered in three budget tiers (i.e., \textsc{Low}/\textsc{Mid}/\textsc{High}). A lightweight router performs budget-tier routing across modules to balance task performance and memory construction cost, which is implemented as a compact neural policy trained with reinforcement learning. Using BudgetMem as a unified testbed, we study three complementary strategies for realizing budget tiers: implementation (method complexity), reasoning (inference behavior), and capacity (module model size). Across LoCoMo, LongMemEval, and HotpotQA, BudgetMem surpasses strong baselines when performance is prioritized (i.e., high-budget setting), and delivers better accuracy-cost frontiers under tighter budgets. Moreover, our analysis disentangles the strengths and weaknesses of different tiering strategies, clarifying when each axis delivers the most favorable trade-offs under varying budget regimes.

Related papers

Budget-Aware Agentic Routing via Boundary-Guided Training [24.0709108941881]
Budget-Aware Agentic Routing selects between a cheap and an expensive model at each step to optimize the cost-success frontier.<n> Boundary-Guided Training builds a difficulty taxonomy to anchor learning under sparse rewards.<n>Experiment results show that our method improves the efficiency frontier, matching strong routing baselines at substantially lower cost.
arXiv Detail & Related papers (2026-02-04T07:39:27Z)
MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning [36.52465672754168]
We introduce MemOCR, a multimodal memory agent that improves long-horizon reasoning under tight context budgets.<n>MemOCR allocates memory space with adaptive information density through visual layout.<n>We train MemOCR with reinforcement learning under budget-aware objectives that expose the agent to diverse compression levels.
arXiv Detail & Related papers (2026-01-29T09:47:17Z)
Agentic Memory: Learning Unified Long-Term and Short-Term Memory Management for Large Language Model Agents [57.38404718635204]
Large language model (LLM) agents face fundamental limitations in long-horizon reasoning due to finite context windows.<n>Existing methods typically handle long-term memory (LTM) and short-term memory (STM) as separate components.<n>We propose Agentic Memory (AgeMem), a unified framework that integrates LTM and STM management directly into the agent's policy.
arXiv Detail & Related papers (2026-01-05T08:24:16Z)
AgentBalance: Backbone-then-Topology Design for Cost-Effective Multi-Agent Systems under Budget Constraints [7.38359558170225]
Large Language Model (LLM)-based multi-agent systems (MAS) are becoming indispensable building blocks for web-scale applications.<n>We present AgentBalance, a framework for constructing cost-effective MAS under explicit token-cost and latency budgets.
arXiv Detail & Related papers (2025-12-12T10:08:03Z)
Remember Me, Refine Me: A Dynamic Procedural Memory Framework for Experience-Driven Agent Evolution [52.76038908826961]
We propose $textbfReMe$ ($textitRemember Me, Refine Me$) to bridge the gap between static storage and dynamic reasoning.<n>ReMe innovates across the memory lifecycle via three mechanisms: $textitmulti-faceted distillation$, which extracts fine-grained experiences.<n>Experiments on BFCL-V3 and AppWorld demonstrate that ReMe establishes a new state-of-the-art in agent memory system.
arXiv Detail & Related papers (2025-12-11T14:40:01Z)
Budget-Aware Tool-Use Enables Effective Agent Scaling [82.6942342482552]
Scaling test-time computation improves performance across different tasks on large language models (LLMs)<n>We study how to scale such agents effectively under explicit tool-call budgets, focusing on web search agents.<n>We introduce the Budget Tracker, a lightweight plug-in that provides the agent with continuous budget awareness.
arXiv Detail & Related papers (2025-11-21T07:18:55Z)
BudgetMem: Learning Selective Memory Policies for Cost-Efficient Long-Context Processing in Language Models [0.0]
BudgetMem is a novel memory augmented architecture that learns what to remember rather than remembering everything.<n>Our system combines selective memory policies with feature based salience scoring to decide which information merits storage under strict budget constraints.<n>Our work provides a practical pathway for deploying capable long context systems on modest hardware, democratizing access to advanced language understanding capabilities.
arXiv Detail & Related papers (2025-11-07T01:49:22Z)
AILoRA: Function-Aware Asymmetric Initialization for Low-Rank Adaptation of Large Language Models [11.663809872664105]
Low-Rank Adaptation (LoRA) has emerged as one of the most widely adopted approaches.<n>LoRA is typically applied to the $WQ$ and $WV$ projection matrices of self-attention modules.<n>We introduce textAILoRA, a novel parameter-efficient method that incorporates function-aware asymmetric low-rank priors.
arXiv Detail & Related papers (2025-10-09T10:13:16Z)
Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning [89.55738101744657]
Large Language Models (LLMs) have demonstrated impressive capabilities across a wide range of NLP tasks, but they remain fundamentally stateless.<n>We present Memory-R1, a reinforcement learning framework that equips LLMs with the ability to actively manage and utilize external memory.
arXiv Detail & Related papers (2025-08-27T12:26:55Z)
BudgetThinker: Empowering Budget-aware LLM Reasoning with Control Tokens [33.607723102172194]
BudgetThinker is a framework designed to empower Large Language Models with budget-aware reasoning.<n>We show that BudgetThinker significantly surpasses strong baselines in maintaining performance across a variety of reasoning budgets.
arXiv Detail & Related papers (2025-08-24T03:17:50Z)
RCR-Router: Efficient Role-Aware Context Routing for Multi-Agent LLM Systems with Structured Memory [57.449129198822476]
RCR is a role-aware context routing framework for multi-agent large language model (LLM) systems.<n>It dynamically selects semantically relevant memory subsets for each agent based on its role and task stage.<n>A lightweight scoring policy guides memory selection, and agent outputs are integrated into a shared memory store.
arXiv Detail & Related papers (2025-08-06T21:59:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.