Related papers: EGSS: Entropy-guided Stepwise Scaling for Reliable Software Engineering

EGSS: Entropy-guided Stepwise Scaling for Reliable Software Engineering

URL: http://arxiv.org/abs/2602.05242v1
Date: Thu, 05 Feb 2026 03:02:54 GMT
Title: EGSS: Entropy-guided Stepwise Scaling for Reliable Software Engineering
Authors: Chenhui Mao, Yuanting Lei, Zhixiang Wei, Ming Liang, Zhixiang Wang, Jingxuan Xu, Dajun Chen, Wei Jiang, Yong Li,
Abstract summary: Agentic Test-Time Scaling (TTS) has delivered state-of-the-art (SOTA) performance on complex software engineering tasks such as code generation and bug fixing.<n>We propose Entropy-Guided Stepwise Scaling (EGSS), a novel TTS framework that balances efficiency and effectiveness through entropy-guided adaptive search and robust test-suite augmentation.
Score: 14.718324012970944
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Agentic Test-Time Scaling (TTS) has delivered state-of-the-art (SOTA) performance on complex software engineering tasks such as code generation and bug fixing. However, its practical adoption remains limited due to significant computational overhead, primarily driven by two key challenges: (1) the high cost associated with deploying excessively large ensembles, and (2) the lack of a reliable mechanism for selecting the optimal candidate solution, ultimately constraining the performance gains that can be realized. To address these challenges, we propose Entropy-Guided Stepwise Scaling (EGSS), a novel TTS framework that dynamically balances efficiency and effectiveness through entropy-guided adaptive search and robust test-suite augmentation. Extensive experiments on SWE-Bench-Verified demonstrate that EGSS consistently boosts performance by 5-10% across all evaluated models. Specifically, it increases the resolved ratio of Kimi-K2-Intruct from 63.2% to 72.2%, and GLM-4.6 from 65.8% to 74.6%. Furthermore, when paired with GLM-4.6, EGSS achieves a new state-of-the-art among open-source large language models. In addition to these accuracy improvements, EGSS reduces inference-time token usage by over 28% compared to existing TTS methods, achieving simultaneous gains in both effectiveness and computational efficiency.

Related papers

ToaSt: Token Channel Selection and Structured Pruning for Efficient ViT [14.21482208417138]
Vision Transformers (ViTs) have achieved remarkable success across various vision tasks, yet their deployment is often hindered by prohibitive computational costs.<n>We propose ToaSt, a decoupled framework applying specialized strategies to distinct ViT components.
arXiv Detail & Related papers (2026-02-17T16:52:13Z)
SED-SFT: Selectively Encouraging Diversity in Supervised Fine-Tuning [54.393763477932474]
Supervised Fine-Tuning (SFT) followed by Reinforcement Learning (RL) has emerged as the standard post-training paradigm for large language models (LLMs)<n>We propose SED-SFT, which adaptively encourages diversity based on the token exploration space.<n>This framework introduces a selective entropy regularization term with a selective masking mechanism into the optimization objective.
arXiv Detail & Related papers (2026-02-07T09:39:21Z)
SWE-RM: Execution-free Feedback For Software Engineering Agents [61.86380395896069]
Execution-based feedback is widely used in the development of coding agents through test-time scaling (TTS) and reinforcement learning (RL)<n>In contrast, execution-free feedback from reward models can provide more fine-grained signals without depending on unit test cases.<n>We introduce SWE-RM, an accurate and robust reward model adopting a mixture-of-experts architecture with 30B total parameters and 3B activated during inference.
arXiv Detail & Related papers (2025-12-26T08:26:18Z)
Modified TSception for Analyzing Driver Drowsiness and Mental Workload from EEG [6.767263284839525]
Driver drowsiness remains a primary cause of traffic accidents, necessitating the development of real-time, reliable detection systems.<n>This study presents a Modified TSception architecture designed for the robust assessment of driver fatigue using Electroencephalography (EEG)<n>The architecture's generalizability is validated on the STEW mental workload dataset.
arXiv Detail & Related papers (2025-12-25T17:48:11Z)
DEPO: Dual-Efficiency Preference Optimization for LLM Agents [75.6723341304463]
We propose DEPO, a dual-efficiency preference optimization method that jointly rewards succinct responses and fewer action steps.<n>Experiments on WebShop and BabyAI show that DEPO cuts token usage by up to 60.9% and steps by up to 26.9%, while achieving up to a 29.3% improvement in performance.
arXiv Detail & Related papers (2025-11-19T12:38:43Z)
IIET: Efficient Numerical Transformer via Implicit Iterative Euler Method [59.02943805284446]
Iterative Implicit Euler Transformer (IIET)<n>IIAD allows users to effectively balance the performance-efficiency trade-off.<n>E-IIET variant achieves an average performance gain exceeding 1.6% over vanilla Transformer with comparable speed.
arXiv Detail & Related papers (2025-09-26T15:14:03Z)
ETTRL: Balancing Exploration and Exploitation in LLM Test-Time Reinforcement Learning Via Entropy Mechanism [10.913346263482786]
We introduce an entropy-based mechanism to enhance the exploration-exploitation balance in test-time reinforcement learning.<n>Compared with the baseline, our approach enables Llama3.1-8B to achieve a 68 percent relative improvement in Pass at 1 metric.
arXiv Detail & Related papers (2025-08-15T09:49:14Z)
Faster and Better LLMs via Latency-Aware Test-Time Scaling [47.3923926808606]
Test-Time Scaling (TTS) has proven effective in improving the performance of Large Language Models (LLMs) during inference.<n>Existing research has overlooked the efficiency of TTS from a latency-sensitive perspective.<n>We demonstrate that a compute-optimal TTS does not always result in the lowest latency in scenarios where latency is critical.
arXiv Detail & Related papers (2025-05-26T07:51:30Z)
Transformers with Joint Tokens and Local-Global Attention for Efficient Human Pose Estimation [34.99437411281915]
This paper proposes two ViT-based models for accurate, efficient, and robust 2D pose estimation.<n> Experiments on six benchmarks demonstrate that the proposed methods significantly outperform state-of-the-art methods.
arXiv Detail & Related papers (2025-02-28T22:34:22Z)
MRSO: Balancing Exploration and Exploitation through Modified Rat Swarm Optimization for Global Optimization [3.7503163440313463]
This study introduces Modified Rat Swarm (MRSO) to enhance the balance between exploration and exploitation. MRSO incorporates unique modifications to improve search efficiency and durability, making it suitable for challenging engineering problems such as welded beam, pressure vessel, and gear train design. In the CEC 2019 benchmarks, MRSO outperforms the standard RSO in six out of ten functions, demonstrating superior global search capabilities.
arXiv Detail & Related papers (2024-09-20T14:52:25Z)
ERNIE-SPARSE: Learning Hierarchical Efficient Transformer Through Regularized Self-Attention [48.697458429460184]
Two factors, information bottleneck sensitivity and inconsistency between different attention topologies, could affect the performance of the Sparse Transformer. This paper proposes a well-designed model named ERNIE-Sparse. It consists of two distinctive parts: (i) Hierarchical Sparse Transformer (HST) to sequentially unify local and global information, and (ii) Self-Attention Regularization (SAR) to minimize the distance for transformers with different attention topologies.
arXiv Detail & Related papers (2022-03-23T08:47:01Z)
Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner Party Transcription [73.66530509749305]
In this paper, we argue that, even in difficult cases, some end-to-end approaches show performance close to the hybrid baseline. We experimentally compare and analyze CTC-Attention versus RNN-Transducer approaches along with RNN versus Transformer architectures. Our best end-to-end model based on RNN-Transducer, together with improved beam search, reaches quality by only 3.8% WER abs. worse than the LF-MMI TDNN-F CHiME-6 Challenge baseline.
arXiv Detail & Related papers (2020-04-22T19:08:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.