Related papers: Context as a Tool: Context Management for Long-Horizon SWE-Agents

Context as a Tool: Context Management for Long-Horizon SWE-Agents

URL: http://arxiv.org/abs/2512.22087v1
Date: Fri, 26 Dec 2025 17:15:47 GMT
Title: Context as a Tool: Context Management for Long-Horizon SWE-Agents
Authors: Shukai Liu, Jian Yang, Bo Jiang, Yizhi Li, Jinyang Guo, Xianglong Liu, Bryan Dai,
Abstract summary: We propose CAT, a new context management paradigm that elevates context maintenance to a callable tool integrated into the decision-making process of agents.<n> CAT formalizes a structured context workspace consisting of stable task semantics, condensed long-term memory, and high-fidelity short-term interactions.<n>We show that SWE-Compressor reaches a 57.6% solved rate and significantly outperforms ReAct-based agents and static compression baselines.
Score: 38.950807465620365
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Agents based on large language models have recently shown strong potential on real-world software engineering (SWE) tasks that require long-horizon interaction with repository-scale codebases. However, most existing agents rely on append-only context maintenance or passively triggered compression heuristics, which often lead to context explosion, semantic drift, and degraded reasoning in long-running interactions. We propose CAT, a new context management paradigm that elevates context maintenance to a callable tool integrated into the decision-making process of agents. CAT formalizes a structured context workspace consisting of stable task semantics, condensed long-term memory, and high-fidelity short-term interactions, and enables agents to proactively compress historical trajectories into actionable summaries at appropriate milestones. To support context management for SWE-agents, we propose a trajectory-level supervision framework, CAT-GENERATOR, based on an offline data construction pipeline that injects context-management actions into complete interaction trajectories. Using this framework, we train a context-aware model, SWE-Compressor. Experiments on SWE-Bench-Verified demonstrate that SWE-Compressor reaches a 57.6% solved rate and significantly outperforms ReAct-based agents and static compression baselines, while maintaining stable and scalable long-horizon reasoning under a bounded context budget.

Related papers

Agentic Spatio-Temporal Grounding via Collaborative Reasoning [80.83158605034465]
Temporal Video Grounding aims to retrieve thetemporal tube of a target object or person in a video given a text query.<n>We propose the Agentic Spatio-Temporal Grounder (ASTG) framework for the task of STVG towards an open-world and training-free scenario.<n>Specifically, two specialized agents SRA (Spatial Reasoning Agent) and TRA (Temporal Reasoning Agent) constructed leveraging on modern Multimoal Large Language Models (MLLMs)<n>Experiments on popular benchmarks demonstrate the superiority of the proposed approach where it outperforms existing weakly-supervised and zero-shot approaches by a margin
arXiv Detail & Related papers (2026-02-10T10:16:27Z)
AgentLongBench: A Controllable Long Benchmark For Long-Contexts Agents via Environment Rollouts [78.33143446024485]
We introduce textbfAgentLongBench, which evaluates agents through simulated environment rollouts based on Lateral Thinking Puzzles.<n>This framework generates rigorous interaction trajectories across knowledge-intensive and knowledge-free scenarios.
arXiv Detail & Related papers (2026-01-28T16:05:44Z)
ARC: Active and Reflection-driven Context Management for Long-Horizon Information Seeking Agents [9.76162701959422]
ARC is a framework for systematically formulate context management.<n>It treats context as a dynamic internal reasoning state during execution.<n>It consistently outperforms passive context compression methods.
arXiv Detail & Related papers (2026-01-17T12:17:50Z)
CoDA: A Context-Decoupled Hierarchical Agent with Reinforcement Learning [12.710191300398924]
We introduce CoDA, a reinforcement learning framework that decouples high-level planning from low-level execution.<n>CoDA achieves significant performance improvements over state-of-the-art baselines on complex multi-hop question-answering benchmarks.
arXiv Detail & Related papers (2025-12-14T14:41:29Z)
Reason-Plan-ReAct: A Reasoner-Planner Supervising a ReAct Executor for Complex Enterprise Tasks [0.0]
We introduce RP-ReAct, a novel multi-agent approach that decouples strategic planning from low-level execution to achieve superior reliability and efficiency.<n>RP-ReAct consists of a Reasoner Planner Agent (RPA), responsible for planning each sub-step, and one or multiple Proxy-Execution Agent (PEA) that translates sub-steps into concrete tool interactions.<n>We evaluate RP-ReAct, on the challenging, multi-domain ToolQA benchmark using a diverse set of six open-weight reasoning models.
arXiv Detail & Related papers (2025-12-03T08:28:40Z)
Scaling Long-Horizon LLM Agent via Context-Folding [46.685552398338295]
We introduce Context-Folding, a framework that empowers agents to actively manage their working context.<n>An agent can procedurally branch into a sub-trajectory to handle a subtask and then fold it upon completion, collapsing the intermediate steps while retaining a concise summary of the outcome.
arXiv Detail & Related papers (2025-10-13T22:00:58Z)
COMPASS: Enhancing Agent Long-Horizon Reasoning with Evolving Context [17.575806280348797]
Small errors compound across steps, and even state-of-the-art models often hallucinate or lose coherence.<n>We propose a lightweight hierarchical framework that separates tactical execution, strategic oversight, and context organization into three specialized components.
arXiv Detail & Related papers (2025-10-09T20:14:26Z)
ContextNav: Towards Agentic Multimodal In-Context Learning [85.05420047017513]
ContextNav is an agentic framework that integrates the scalability of automated retrieval with the quality and adaptiveness of human-like curation.<n>It builds a resource-aware multimodal embedding pipeline, maintains a retrievable vector database, and applies agentic retrieval and structural alignment to construct noise-resilient contexts.<n> Experimental results demonstrate that ContextNav achieves state-of-the-art performance across various datasets.
arXiv Detail & Related papers (2025-10-06T07:49:52Z)
Efficient On-Device Agents via Adaptive Context Management [1.1172382217477128]
On-device AI agents offer the potential for personalized, low-latency assistance, but their deployment is constrained by limited memory capacity.<n>We break this trade-off with a framework for context-efficient on-device agents, driven by three synergistic optimizations.<n>Our agent matches, or exceeds, the performance of a conventional baseline while dramatically compressing context.
arXiv Detail & Related papers (2025-09-24T19:46:50Z)
ToolACE-MT: Non-Autoregressive Generation for Agentic Multi-Turn Interaction [84.90394416593624]
Agentic task-solving with Large Language Models (LLMs) requires multi-turn, multi-step interactions.<n>Existing simulation-based data generation methods rely heavily on costly autoregressive interactions between multiple agents.<n>We propose a novel Non-Autoregressive Iterative Generation framework, called ToolACE-MT, for constructing high-quality multi-turn agentic dialogues.
arXiv Detail & Related papers (2025-08-18T07:38:23Z)
Less is More: Empowering GUI Agent with Context-Aware Simplification [62.02157661751793]
We propose a context-aware framework for building an efficient and effective GUI Agent, termed SimpAgent.<n>With the above components, SimpAgent reduces 27% FLOPs and achieves superior GUI navigation performances.
arXiv Detail & Related papers (2025-07-04T17:37:15Z)
Scalable In-Context Q-Learning [68.9917436397079]
We propose textbfScalable textbfIn-textbfContext textbfQ-textbfLearning (textbfSICQL) to steer in-context reinforcement learning.<n>textbfSICQL harnesses dynamic programming and world modeling to steer ICRL toward efficient reward and task generalization.
arXiv Detail & Related papers (2025-06-02T04:21:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.