Efficient On-Device Agents via Adaptive Context Management
- URL: http://arxiv.org/abs/2511.03728v1
- Date: Wed, 24 Sep 2025 19:46:50 GMT
- Title: Efficient On-Device Agents via Adaptive Context Management
- Authors: Sanidhya Vijayvargiya, Rahul Lokesh,
- Abstract summary: On-device AI agents offer the potential for personalized, low-latency assistance, but their deployment is constrained by limited memory capacity.<n>We break this trade-off with a framework for context-efficient on-device agents, driven by three synergistic optimizations.<n>Our agent matches, or exceeds, the performance of a conventional baseline while dramatically compressing context.
- Score: 1.1172382217477128
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: On-device AI agents offer the potential for personalized, low-latency assistance, but their deployment is fundamentally constrained by limited memory capacity, which restricts usable context. This reduced practical context window creates a trade-off between supporting rich, stateful interactions with complex tool capabilities and maintaining on-device feasibility. We break this trade-off with a framework for context-efficient on-device agents, driven by three synergistic optimizations (1) a dynamic memory system using specialized LoRA adapters to distill conversational history into a compressed, and structured Context State Object; (2) a minimalist serialization format for tool schemas to minimize token overhead per tool; and (3) a just-in-time schema-passing mechanism that loads full tool definitions only upon tool selection. We instantiate this framework by adapting a 3B parameter SLM to context-efficient trajectories and rigorously evaluate it against a conventional baseline on complex user tasks. Our agent matches, or exceeds, the performance of a conventional baseline while dramatically compressing context, achieving more than a 6-fold reduction in initial system prompt context and a 10- to 25-fold reduction in context growth rate based on the interaction verbosity, demonstrating that strategic context management is key to unlocking capable and persistent on-device AI.
Related papers
- ToolSelf: Unifying Task Execution and Self-Reconfiguration via Tool-Driven Intrinsic Adaptation [60.25542764389203]
Agentic systems powered by Large Language Models (LLMs) have demonstrated remarkable potential in tackling complex, long-horizon tasks.<n>Existing approaches, relying on manual orchestration or runtime-based patches, often struggle with poor generalization and fragmented optimization.<n>We propose ToolSelf, a novel paradigm enabling tool-driven self-readjustment.
arXiv Detail & Related papers (2026-02-08T09:27:18Z) - Jenius Agent: Towards Experience-Driven Accuracy Optimization in Real-World Scenarios [0.9069311779417014]
This paper introduces an agent framework grounded in real-world practical experience.<n>An end-to-end framework named Jenius-Agent has been integrated with three key optimizations.<n>Experiments show a 20 percent improvement in task accuracy, along with a reduced token cost, response latency, and invocation failures.
arXiv Detail & Related papers (2026-01-05T07:35:12Z) - Context as a Tool: Context Management for Long-Horizon SWE-Agents [38.950807465620365]
We propose CAT, a new context management paradigm that elevates context maintenance to a callable tool integrated into the decision-making process of agents.<n> CAT formalizes a structured context workspace consisting of stable task semantics, condensed long-term memory, and high-fidelity short-term interactions.<n>We show that SWE-Compressor reaches a 57.6% solved rate and significantly outperforms ReAct-based agents and static compression baselines.
arXiv Detail & Related papers (2025-12-26T17:15:47Z) - Towards Efficient Agents: A Co-Design of Inference Architecture and System [66.59916327634639]
This paper presents AgentInfer, a unified framework for end-to-end agent acceleration.<n>We decompose the problem into four synergistic components: AgentCollab, AgentSched, AgentSAM, and AgentCompress.<n>Experiments on the BrowseComp-zh and DeepDiver benchmarks demonstrate that through the synergistic collaboration of these methods, AgentInfer reduces ineffective token consumption by over 50%.
arXiv Detail & Related papers (2025-12-20T12:06:13Z) - Reason-Plan-ReAct: A Reasoner-Planner Supervising a ReAct Executor for Complex Enterprise Tasks [0.0]
We introduce RP-ReAct, a novel multi-agent approach that decouples strategic planning from low-level execution to achieve superior reliability and efficiency.<n>RP-ReAct consists of a Reasoner Planner Agent (RPA), responsible for planning each sub-step, and one or multiple Proxy-Execution Agent (PEA) that translates sub-steps into concrete tool interactions.<n>We evaluate RP-ReAct, on the challenging, multi-domain ToolQA benchmark using a diverse set of six open-weight reasoning models.
arXiv Detail & Related papers (2025-12-03T08:28:40Z) - Z-Space: A Multi-Agent Tool Orchestration Framework for Enterprise-Grade LLM Automation [3.518072776386001]
This paper proposes Z-Space, a data-generation-oriented multi-agent collaborative tool invocation framework.<n>The framework has been deployed in the Eleme platform's technical division, serving large-scale test data generation scenarios.<n>Production data demonstrates that the system reduces average token consumption in tool inference by 96.26%.
arXiv Detail & Related papers (2025-11-23T03:59:14Z) - VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use [78.29315418819074]
We introduce VerlTool, a unified and modular framework that addresses limitations through systematic design principles.<n>Our framework formalizes ARLT as multi-turn trajectories with multi-modal observation tokens (text/image/video), extending beyond single-turn RLVR paradigms.<n>The modular plugin architecture enables rapid tool integration requiring only lightweight Python definitions.
arXiv Detail & Related papers (2025-09-01T01:45:18Z) - RCR-Router: Efficient Role-Aware Context Routing for Multi-Agent LLM Systems with Structured Memory [57.449129198822476]
RCR is a role-aware context routing framework for multi-agent large language model (LLM) systems.<n>It dynamically selects semantically relevant memory subsets for each agent based on its role and task stage.<n>A lightweight scoring policy guides memory selection, and agent outputs are integrated into a shared memory store.
arXiv Detail & Related papers (2025-08-06T21:59:34Z) - AutoLoRA: Automatic LoRA Retrieval and Fine-Grained Gated Fusion for Text-to-Image Generation [32.46570968627392]
Low-rank adaptation (LoRA) have demonstrated efficacy in enabling model customization with minimal parameter overhead.<n>We introduce a novel framework that enables semantic-driven LoRA retrieval and dynamic aggregation.<n>Our approach achieves significant improvement in image generation perfermance.
arXiv Detail & Related papers (2025-08-04T06:36:00Z) - ReAgent-V: A Reward-Driven Multi-Agent Framework for Video Understanding [71.654781631463]
ReAgent-V is a novel agentic video understanding framework.<n>It integrates efficient frame selection with real-time reward generation during inference.<n>Extensive experiments on 12 datasets demonstrate significant gains in generalization and reasoning.
arXiv Detail & Related papers (2025-06-02T04:23:21Z) - Autonomous Deep Agent [0.7489814067742621]
Deep Agent is an advanced autonomous AI system designed to manage complex multi-phase tasks.<n>The system's foundation is built on our Hierarchical Task DAG framework.<n>Deep Agent establishes a novel paradigm in self-governing AI systems.
arXiv Detail & Related papers (2025-02-10T21:46:54Z) - Autonomous Structural Memory Manipulation for Large Language Models Using Hierarchical Embedding Augmentation [0.0]
This study introduces hierarchical embedding augmentation as a means to redefine the representation of tokens through multi-level semantic structures.<n>Results reveal substantial improvements in computational efficiency, with marked reductions in processing overhead for longer input sequences.<n>The ability to dynamically adjust token representations and memory configurations contributed to the model's robustness under varied and unpredictable input conditions.
arXiv Detail & Related papers (2025-01-23T22:20:36Z) - Asynchronous Tool Usage for Real-Time Agents [61.3041983544042]
We introduce asynchronous AI agents capable of parallel processing and real-time tool-use.
Our key contribution is an event-driven finite-state machine architecture for agent execution and prompting.
This work presents both a conceptual framework and practical tools for creating AI agents capable of fluid, multitasking interactions.
arXiv Detail & Related papers (2024-10-28T23:57:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.