Related papers: The Debugging Decay Index: Rethinking Debugging Strategies for Code LLMs

The Debugging Decay Index: Rethinking Debugging Strategies for Code LLMs

URL: http://arxiv.org/abs/2506.18403v2
Date: Sun, 13 Jul 2025 09:04:33 GMT
Title: The Debugging Decay Index: Rethinking Debugging Strategies for Code LLMs
Authors: Muntasir Adnan, Carlos C. N. Kuhn,
Abstract summary: We introduce the Decay Index (DDI), a mathematical framework that quantifies when debug becomes ineffective and predicts intervention points.<n>DDI reveals a fundamental limitation in current AI debug and provides the first quantitative framework for optimising iterative code generation strategies.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The effectiveness of AI debugging follows a predictable exponential decay pattern; most models lose 60-80% of their debugging capability within just 2-3 attempts, despite iterative debugging being a critical capability for practical code generation systems. We introduce the Debugging Decay Index (DDI), a mathematical framework that quantifies when debugging becomes ineffective and predicts intervention points. Our strategic fresh start approach shifts from exploitation to exploration at strategic points in the debugging process, demonstrating that well-timed interventions can rescue the effectiveness of debugging. DDI reveals a fundamental limitation in current AI debugging and provides the first quantitative framework for optimising iterative code generation strategies.

Related papers

How Efficient Are Diffusion Language Models? A Critical Examination of Efficiency Evaluation Practices [81.85465545346266]
Diffusion language models (DLMs) have emerged as a promising alternative to the long-dominant autoregressive (AR) paradigm.<n>Yet, current open-source DLMs often underperform their AR counterparts in speed, limiting their real-world utility.<n>This work presents a systematic study of DLM efficiency, identifying key issues in prior evaluation methods.
arXiv Detail & Related papers (2025-10-21T10:00:32Z)
InspectCoder: Dynamic Analysis-Enabled Self Repair through interactive LLM-Debugger Collaboration [71.18377595277018]
Large Language Models (LLMs) frequently generate buggy code with complex logic errors that are challenging to diagnose.<n>We present InspectCoder, the first agentic program repair system that empowers LLMs to actively conduct dynamic analysis via interactive debugger control.
arXiv Detail & Related papers (2025-10-21T06:26:29Z)
LOP: Learning Optimal Pruning for Efficient On-Demand MLLMs Scaling [52.1366057696919]
LOP is an efficient neural pruning framework that learns optimal pruning strategies from the target pruning constraint.<n>LOP approach trains autoregressive neural networks (NNs) to directly predict layer-wise pruning strategies adaptive to the target pruning constraint.<n> Experimental results show that LOP outperforms state-of-the-art pruning methods in various metrics while achieving up to three orders of magnitude speedup.
arXiv Detail & Related papers (2025-06-15T12:14:16Z)
DARS: Dynamic Action Re-Sampling to Enhance Coding Agent Performance by Adaptive Tree Traversal [55.13854171147104]
Large Language Models (LLMs) have revolutionized various domains, including natural language processing, data analysis, and software development.<n>We present Dynamic Action Re-Sampling (DARS), a novel inference time compute scaling approach for coding agents.<n>We evaluate our approach on SWE-Bench Lite benchmark, demonstrating that this scaling strategy achieves a pass@k score of 55% with Claude 3.5 Sonnet V2.
arXiv Detail & Related papers (2025-03-18T14:02:59Z)
GradStop: Exploring Training Dynamics in Unsupervised Outlier Detection through Gradient [18.077143014067126]
Unsupervised Outlier Detection (UOD) is a critical task in data mining and machine learning, aiming to identify instances that significantly deviate from the majority.<n>Without any label, deep UOD methods struggle with the misalignment between the model's direct optimization goal and the final performance goal of Outlier Detection task.<n>This paper proposes an early stopping algorithm to optimize the training of deep UOD models, ensuring they perform optimally in Outlier Detection task.
arXiv Detail & Related papers (2024-12-11T16:07:58Z)
COAST: Enhancing the Code Debugging Ability of LLMs through Communicative Agent Based Data Synthesis [29.667170755786508]
We introduce EVAL, a benchmark for evaluating the abilities of Large Language Models.<n>We propose the COmmunicative Agent-based data SynThesis framework, which employs a multi-agent system to generate high-quality training data.<n>Results demonstrate that COAST-generated data outperform human-curated and GPT-4-generated data.
arXiv Detail & Related papers (2024-08-09T11:35:44Z)
Landscape-Aware Growing: The Power of a Little LAG [49.897766925371485]
We study the question of how to select the best growing strategy from a given pool of growing strategies. We present an alternative perspective based on early training dynamics, which we call "landscape-aware growing (LAG)"
arXiv Detail & Related papers (2024-06-04T16:38:57Z)
PREM: A Simple Yet Effective Approach for Node-Level Graph Anomaly Detection [65.24854366973794]
Node-level graph anomaly detection (GAD) plays a critical role in identifying anomalous nodes from graph-structured data in domains such as medicine, social networks, and e-commerce. We introduce a simple method termed PREprocessing and Matching (PREM for short) to improve the efficiency of GAD. Our approach streamlines GAD, reducing time and memory consumption while maintaining powerful anomaly detection capabilities.
arXiv Detail & Related papers (2023-10-18T02:59:57Z)
Accelerating System-Level Debug Using Rule Learning and Subgroup Discovery Techniques [1.6317061277457001]
We describe how it provides high quality debug hints for reducing the debug effort. As a case study, we used these techniques for root-causing failures of the Power Management (PM) design feature Package-C8. We propose an approach for mining the root-causing experience and results for reuse, to accelerate future debug activities and reduce dependency on validation experts.
arXiv Detail & Related papers (2022-07-02T22:00:30Z)
Targeted Attack against Deep Neural Networks via Flipping Limited Weight Bits [55.740716446995805]
We study a novel attack paradigm, which modifies model parameters in the deployment stage for malicious purposes. Our goal is to misclassify a specific sample into a target class without any sample modification. By utilizing the latest technique in integer programming, we equivalently reformulate this BIP problem as a continuous optimization problem.
arXiv Detail & Related papers (2021-02-21T03:13:27Z)
Progressive Self-Guided Loss for Salient Object Detection [102.35488902433896]
We present a progressive self-guided loss function to facilitate deep learning-based salient object detection in images. Our framework takes advantage of adaptively aggregated multi-scale features to locate and detect salient objects effectively.
arXiv Detail & Related papers (2021-01-07T07:33:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.