Enhancing Automated Program Repair via Faulty Token Localization and Quality-Aware Patch Refinement
- URL: http://arxiv.org/abs/2511.18001v1
- Date: Sat, 22 Nov 2025 10:05:26 GMT
- Title: Enhancing Automated Program Repair via Faulty Token Localization and Quality-Aware Patch Refinement
- Authors: Jiaolong Kong, Xiaofei Xie, Yiheng Xiong, Yuekun Wang, Jian Wang,
- Abstract summary: TokenRepair is a novel two-level refinement framework for program repair.<n>It integrates internal reflection for localizing potentially faulty tokens with external feedback for quality-aware patch refinement.<n> TokenRepair achieves new state-of-the-art repair performance, correctly fixing 88 bugs on Defects4J 1.2 and 139 bugs on HumanEval-Java.
- Score: 15.978451025074962
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) have recently demonstrated strong potential for automated program repair (APR). However, existing LLM-based techniques primarily rely on coarse-grained external feedback (e.g.,test results) to guide iterative patch generation, while lacking fine-grained internal signals that reveal why a patch fails or which parts of the generated code are likely incorrect. This limitation often leads to inefficient refinement, error propagation, and suboptimal repair performance. In this work, we propose TokenRepair, a novel two-level refinement framework that enhances APR by integrating internal reflection for localizing potentially faulty tokens with external feedback for quality-aware patch refinement. Specifically, TokenRepair first performs internal reflection by analyzing context-aware token-level uncertainty fluctuations to identify suspicious or low-confidence tokens within a patch. It then applies Chain-of-Thought guided rewriting to refine only these localized tokens, enabling targeted and fine-grained correction. To further stabilize the iterative repair loop, TokenRepair incorporates a quality-aware external feedback mechanism that evaluates patch quality and filters out low-quality candidates before refinement. Experimental results show that TokenRepair achieves new state-of-the-art repair performance, correctly fixing 88 bugs on Defects4J 1.2 and 139 bugs on HumanEval-Java, demonstrating substantial improvements ranging from 8.2% to 34.9% across all models on Defects4J 1.2 and from 3.3% to 16.1% on HumanEval-Java.
Related papers
- Specification Vibing for Automated Program Repair [8.68148153927532]
VibeRepair is a specification-centric APR technique that treats repair as behavior-specification repair rather than ad-hoc code editing.<n>On Defects4J v1.2, VibeRepair correctly repairs 174 bugs, exceeding the strongest state-of-the-art baseline by 28 bugs.<n>On Defects4J v2.0, it repairs 178 bugs, outperforming prior approaches by 33 bugs, representing a 23% improvement.
arXiv Detail & Related papers (2026-02-09T04:44:58Z) - Well Begun is Half Done: Location-Aware and Trace-Guided Iterative Automated Vulnerability Repair [8.461073497106222]
Existing vulnerability repair approaches ignore the concern of locations that need to be patched and focus solely on the repair content.<n>We propose sysname, an LLM-based approach that provides information about where should be patched first.<n>sysname improves the iterative repair strategy by assessing the quality of test-failing patches and selecting the best patch for the next iteration.
arXiv Detail & Related papers (2025-12-23T09:54:22Z) - CARE What Fails: Contrastive Anchored-REflection for Verifiable Multimodal [84.71254539482369]
Group-relative reinforcement learning with verifiable rewards (RLVR) often wastes the most informative data it already has the failures.<n>We present CARE, a failure-centric post-training framework for multimodal reasoning that turns errors into supervision.<n> CARE improves accuracy and training smoothness while explicitly increasing the share of learning signal that comes from failures.
arXiv Detail & Related papers (2025-12-22T16:34:21Z) - Reinforcement Learning with Verifiable yet Noisy Rewards under Imperfect Verifiers [90.50039419576807]
Reinforcement Learning with Verifiable Rewards (RLVR) trains policies against automated verifiers to avoid costly human labeling.<n>To reduce vulnerability to verifier hacking, many RLVR systems collapse rewards to binary $0,1$ during training.<n>This choice carries a cost: it introduces textitfalse negatives (rejecting correct answers, FNs) and textitfalse positives (accepting incorrect ones, FPs)
arXiv Detail & Related papers (2025-10-01T13:56:44Z) - Refactoring $\neq$ Bug-Inducing: Improving Defect Prediction with Code Change Tactics Analysis [54.361900378970134]
Just-in-time defect prediction (JIT-DP) aims to predict the likelihood of code changes resulting in software defects at an early stage.<n>Prior research has largely ignored code during both the evaluation and methodology phases, despite its prevalence.<n>We propose Code chAnge Tactics (CAT) analysis to categorize code and its propagation, which improves labeling accuracy in the JIT-Defects4J dataset by 13.7%.
arXiv Detail & Related papers (2025-07-25T23:29:25Z) - Specification-Guided Repair of Arithmetic Errors in Dafny Programs using LLMs [79.74676890436174]
We present an APR tool for Dafny that uses formal specifications as oracles for fault localization and repair.<n>We localize faults through a series of steps, which include using Hoare logic to determine the state of each statement within the program.<n>Our tool achieves 89.6% fault localization coverage and GPT-4o mini yields the highest repair success rate of 74.18%.
arXiv Detail & Related papers (2025-07-04T15:36:12Z) - Repair Ingredients Are All You Need: Improving Large Language Model-Based Program Repair via Repair Ingredients Search [41.50068103527948]
We propose ReinFix, a framework that searches for repair ingredients throughout the reasoning and solution phases of bug fixing.<n>During the solution phase, ReinFix searches for external ingredients from historical bug fixes with similar bug patterns.<n> Evaluations on two popular benchmarks demonstrate the effectiveness of our approach over SOTA baselines.
arXiv Detail & Related papers (2025-06-29T06:02:11Z) - The Art of Repair: Optimizing Iterative Program Repair with Instruction-Tuned Models [48.073219761367184]
We investigate an APR pipeline that balances the generation of multiple outputs and multiple rounds of iteration.<n>We fine-tune each model on an APR dataset with three sizes (1K, 30K, 65K) and two techniques (Full Fine-Tuning and LoRA)<n>Our results show that by using only a fraction (1%) of the fine-tuning dataset, we can achieve improvements of up to 78% in the number of plausible patches generated.
arXiv Detail & Related papers (2025-05-05T18:06:51Z) - MQM-APE: Toward High-Quality Error Annotation Predictors with Automatic Post-Editing in LLM Translation Evaluators [53.91199933655421]
Large Language Models (LLMs) have shown significant potential as judges for Machine Translation (MT) quality assessment.<n>We introduce a universal and training-free framework, $textbfMQM-APE, based on the idea of filtering out non-impactful errors.<n>Experiments show that our approach consistently improves both the reliability and quality of error spans against GEMBA-MQM.
arXiv Detail & Related papers (2024-09-22T06:43:40Z) - Hybrid Automated Program Repair by Combining Large Language Models and Program Analysis [12.7034916462208]
Automated Program Repair (APR) has garnered significant attention due to its potential to streamline the bug repair process for human developers.
This paper introduces an innovative APR approach called GIANTREPAIR.
Based on this insight, GIANTREPAIR first constructs patch skeletons from LLM-generated patches to confine the patch space, and then generates high-quality patches tailored to specific programs.
arXiv Detail & Related papers (2024-06-03T05:05:12Z) - Boosting Redundancy-based Automated Program Repair by Fine-grained Pattern Mining [18.7107522872479]
We propose a new repair technique named Repatt, which incorporates a two-level pattern mining process for guiding effective patch generation.<n>We have conducted an experiment on the widely-used Defects4J benchmark and compared Repatt with ten state-of-the-art APR approaches.
arXiv Detail & Related papers (2023-12-26T08:42:32Z) - RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic
Program Repair [75.40584530380589]
We propose a novel Retrieval-Augmented Patch Generation framework (RAP-Gen)
RAP-Gen explicitly leveraging relevant fix patterns retrieved from a list of previous bug-fix pairs.
We evaluate RAP-Gen on three benchmarks in two programming languages, including the TFix benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java.
arXiv Detail & Related papers (2023-09-12T08:52:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.