Related papers: EET: Experience-Driven Early Termination for Cost-Efficient Software Engineering Agents

EET: Experience-Driven Early Termination for Cost-Efficient Software Engineering Agents

URL: http://arxiv.org/abs/2601.05777v1
Date: Fri, 09 Jan 2026 13:01:49 GMT
Title: EET: Experience-Driven Early Termination for Cost-Efficient Software Engineering Agents
Authors: Yaoqi Guo, Ying Xiao, Jie M. Zhang, Mark Harman, Yiling Lou, Yang Liu, Zhenpeng Chen,
Abstract summary: EET is an experience-driven early termination approach for software engineering agents.<n>It reduces the cost of SE agents while preserving task performance.<n>EET consistently reduces total cost by 19%-55%, with negligible loss in resolution rate.
Score: 22.98266662213199
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Software engineering (SE) agents powered by large language models are increasingly adopted in practice, yet they often incur substantial monetary cost. We introduce EET, an experience-driven early termination approach that reduces the cost of SE agents while preserving task performance. EET extracts structured experience from prior issue-resolution executions and leverages it to guide early termination during patch generation and selection, reducing unproductive iterations. We evaluate EET on the SWE-bench Verified benchmark across three representative SE agents. EET consistently reduces total cost by 19%-55% (32% on average), with negligible loss in resolution rate (at most 0.2%). These efficiency gains are achieved, on average, by identifying early-termination opportunities for 11% of issues and reducing API calls, input tokens, and output tokens by 21%, 30%, and 25%, respectively. We release the code, prompts, and data at https://github.com/EffiSEAgent/EET.

Related papers

Agentic Test-Time Scaling for WebAgents [65.5178428849495]
We present Confidence-Aware Test-Time Scaling (CATTS), which uses vote-derived uncertainty to allocate compute only when decisions are genuinely contentious.<n>CATTS improves performance on WebArena-Lite and GoBrowse by up to 9.1% over React while using up to 2.3x fewer tokens than uniform scaling.
arXiv Detail & Related papers (2026-02-12T18:58:30Z)
SWE-Replay: Efficient Test-Time Scaling for Software Engineering Agents [12.355536750226555]
Test-time scaling has been widely adopted to enhance the capabilities of Large Language Model (LLM) agents in software engineering tasks.<n>We introduce SWE-Replay, the first efficient and generalizable test-time scaling technique for modern agents without reliance on potentially noisy value estimates.<n>Our evaluation shows that, on SWE-Bench Verified, SWE-Replay consistently outperforms naive scaling, reducing costs by up to 17.4% while maintaining or even improving performance by up to 3.8%.
arXiv Detail & Related papers (2026-01-29T18:50:29Z)
DEPO: Dual-Efficiency Preference Optimization for LLM Agents [75.6723341304463]
We propose DEPO, a dual-efficiency preference optimization method that jointly rewards succinct responses and fewer action steps.<n>Experiments on WebShop and BabyAI show that DEPO cuts token usage by up to 60.9% and steps by up to 26.9%, while achieving up to a 29.3% improvement in performance.
arXiv Detail & Related papers (2025-11-19T12:38:43Z)
Seer Self-Consistency: Advance Budget Estimation for Adaptive Test-Time Scaling [55.026048429595384]
Test-time scaling improves the inference performance of Large Language Models (LLMs) but also incurs substantial computational costs.<n>We propose SeerSC, a dynamic self-consistency framework that simultaneously improves token efficiency and latency.
arXiv Detail & Related papers (2025-11-12T13:57:43Z)
Intra-request branch orchestration for efficient LLM reasoning [52.68946975865865]
Large Language Models (LLMs) increasingly rely on inference-time reasoning algorithms to improve accuracy on complex tasks.<n>Prior work has largely focused on reducing token usage, often at the expense of accuracy, while overlooking other latency factors.<n>We present DUCHESS, an LLM serving system that reduces cost and latency without sacrificing accuracy through intra-request branch orchestration guided by predictions.
arXiv Detail & Related papers (2025-09-29T15:52:08Z)
Beyond Greedy Exits: Improved Early Exit Decisions for Risk Control and Reliability [14.00844847268286]
Early-Exit Deep Neural Networks enable adaptive inference by allowing prediction at intermediary layers.<n>Our framework demonstrates consistent improvements in speedup (1.70-2.10x) with a minimal performance drop (2%) as compared to full model performance.
arXiv Detail & Related papers (2025-09-28T06:05:24Z)
Improving the Efficiency of LLM Agent Systems through Trajectory Reduction [6.087402350213508]
This paper introduces an inference-time trajectory reduction approach to reduce the cost of agents.<n>We show that AgentDiet can reduce input tokens by 39.9% 59.7%, or the final computational cost by 21.1% 35.9%, while maintaining the same agent performance.
arXiv Detail & Related papers (2025-09-28T02:43:41Z)
Efficient Agents: Building Effective Agents While Reducing Cost [48.65558640786415]
Large Language Model (LLM)-driven agents have enabled sophisticated systems to tackle complex, multi-step tasks.<n>This work presents the first systematic study of the efficiency-effectiveness trade-off in modern agent systems.
arXiv Detail & Related papers (2025-07-24T17:56:51Z)
Runaway is Ashamed, But Helpful: On the Early-Exit Behavior of Large Language Model-based Agents in Embodied Environments [54.67512489842682]
Large language models (LLMs) have demonstrated strong planning and decision-making capabilities in complex embodied environments.<n>We take a first step toward exploring the early-exit behavior for LLM-based agents.
arXiv Detail & Related papers (2025-05-23T08:23:36Z)
Incremental Self-training for Semi-supervised Learning [56.57057576885672]
IST is simple yet effective and fits existing self-training-based semi-supervised learning methods. We verify the proposed IST on five datasets and two types of backbone, effectively improving the recognition accuracy and learning speed.
arXiv Detail & Related papers (2024-04-14T05:02:00Z)
Towards General and Efficient Online Tuning for Spark [55.30868031221838]
We present a general and efficient Spark tuning framework that can deal with the three issues simultaneously. We have implemented this framework as an independent cloud service, and applied it to the data platform in Tencent.
arXiv Detail & Related papers (2023-09-05T02:16:45Z)
Explicit and Implicit Semantic Ranking Framework [13.356884800150457]
We introduce a generic semantic learning-to-rank framework, Self-training Semantic Cross-attention Ranking (sRank) This framework uses linear pairwise loss with mutable training batch sizes and achieves quality gains and high efficiency. It has been applied effectively to show gains on two industry tasks at Microsoft over real-world large-scale data sets.
arXiv Detail & Related papers (2023-04-11T01:10:49Z)
ANDREAS: Artificial intelligence traiNing scheDuler foR accElerAted resource clusterS [1.798617052102518]
We propose ANDREAS, an advanced scheduling solution to maximize performance and minimize Data Centers operational costs. experiments show that we can achieve a cost reduction between 30 and 62% on average with respect to first-principle methods.
arXiv Detail & Related papers (2021-05-11T14:36:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.