Related papers: Repair Ingredients Are All You Need: Improving Large Language Model-Based Program Repair via Repair Ingredients Search

Repair Ingredients Are All You Need: Improving Large Language Model-Based Program Repair via Repair Ingredients Search

URL: http://arxiv.org/abs/2506.23100v1
Date: Sun, 29 Jun 2025 06:02:11 GMT
Title: Repair Ingredients Are All You Need: Improving Large Language Model-Based Program Repair via Repair Ingredients Search
Authors: Jiayi Zhang, Kai Huang, Jian Zhang, Yang Liu, Chunyang Chen,
Abstract summary: We propose ReinFix, a framework that searches for repair ingredients throughout the reasoning and solution phases of bug fixing.<n>During the solution phase, ReinFix searches for external ingredients from historical bug fixes with similar bug patterns.<n> Evaluations on two popular benchmarks demonstrate the effectiveness of our approach over SOTA baselines.
Score: 41.50068103527948
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Automated Program Repair (APR) techniques aim to automatically fix buggy programs. Among these, Large Language Model-based (LLM-based) approaches have shown great promise. Recent advances demonstrate that directly leveraging LLMs can achieve leading results. However, these techniques remain suboptimal in generating contextually relevant and accurate patches, as they often overlook repair ingredients crucial for practical program repair. In this paper, we propose ReinFix, a novel framework that enables LLMs to autonomously search for repair ingredients throughout both the reasoning and solution phases of bug fixing. In the reasoning phase, ReinFix integrates static analysis tools to retrieve internal ingredients, such as variable definitions, to assist the LLM in root cause analysis when it encounters difficulty understanding the context. During the solution phase, when the LLM lacks experience in fixing specific bugs, ReinFix searches for external ingredients from historical bug fixes with similar bug patterns, leveraging both the buggy code and its root cause to guide the LLM in identifying appropriate repair actions, thereby increasing the likelihood of generating correct patches. Evaluations on two popular benchmarks (Defects4J V1.2 and V2.0) demonstrate the effectiveness of our approach over SOTA baselines. Notably, ReinFix fixes 146 bugs, which is 32 more than the baselines on Defects4J V1.2. On Defects4J V2.0, ReinFix fixes 38 more bugs than the SOTA. Importantly, when evaluating on the recent benchmarks that are free of data leakage risk, ReinFix also maintains the best performance.

Related papers

Do AI models help produce verified bug fixes? [62.985237003585674]
Large Language Models are used to produce corrections to software bugs.<n>This paper investigates how programmers use Large Language Models to complement their own skills.<n>The results are a first step towards a proper role for AI and LLMs in providing guaranteed-correct fixes to program bugs.
arXiv Detail & Related papers (2025-07-21T17:30:16Z)
Specification-Guided Repair of Arithmetic Errors in Dafny Programs using LLMs [84.30534714651093]
We present an innovative APR tool for Dafny, a verification-aware programming language.<n>We localize faults through a series of steps, which include using Hoare Logic to determine the state of each statement within the program.<n>We evaluate our approach using DafnyBench, a benchmark of real-world Dafny programs.
arXiv Detail & Related papers (2025-07-04T15:36:12Z)
Studying and Understanding the Effectiveness and Failures of Conversational LLM-Based Repair [3.93048798243871]
Automated program repair (APR) is designed to automate the process of bug-fixing.<n>Advanced APR techniques powered by conversational language models (LLMs) have exhibited impressive repair abilities.<n>Despite the superiority, conversational APR techniques still fail to repair a large number of bugs.
arXiv Detail & Related papers (2025-03-19T09:39:32Z)
ThinkRepair: Self-Directed Automated Program Repair [11.598008952093487]
Large language models (LLMs) instructed by prompt engineering have attracted much attention for their powerful ability to address many kinds of tasks including bug-fixing. We propose a self-directed LLM-based automated program repair, ThinkRepair, with two main phases: collection phase and fixing phase. Evaluations on two widely studied datasets (Defects4J and QuixBugs) by comparing ThinkRepair with 12 SOTA APRs indicate the priority of ThinkRepair in fixing bugs.
arXiv Detail & Related papers (2024-07-30T15:17:07Z)
Investigating the Transferability of Code Repair for Low-Resource Programming Languages [57.62712191540067]
Large language models (LLMs) have shown remarkable performance on code generation tasks. Recent works augment the code repair process by integrating modern techniques such as chain-of-thought reasoning or distillation. We investigate the benefits of distilling code repair for both high and low resource languages.
arXiv Detail & Related papers (2024-06-21T05:05:39Z)
A Unified Debugging Approach via LLM-Based Multi-Agent Synergy [39.11825182386288]
FixAgent is an end-to-end framework for unified debug through multi-agent synergy. It significantly outperforms state-of-the-art repair methods, fixing 1.25$times$ to 2.56$times$ bugs on the repo-level benchmark, Defects4J.
arXiv Detail & Related papers (2024-04-26T04:55:35Z)
Aligning the Objective of LLM-based Program Repair [14.935596175148586]
This paper investigates a new approach to adapt large language models (LLMs) to program repair.<n>Our core insight is that LLM's APR capability can be greatly improved by simply aligning the output to their training objective.<n>Based on this insight, we designed D4C, a straightforward prompting framework for APR.
arXiv Detail & Related papers (2024-04-13T02:36:40Z)
ContrastRepair: Enhancing Conversation-Based Automated Program Repair via Contrastive Test Case Pairs [23.419180504723546]
ContrastRepair is a novel APR approach that augments conversation-driven APR by providing contrastive test pairs. We evaluate ContrastRepair on multiple benchmark datasets, including Defects4j, QuixBugs, and HumanEval-Java.
arXiv Detail & Related papers (2024-03-04T12:15:28Z)
A Novel Approach for Automatic Program Repair using Round-Trip Translation with Large Language Models [50.86686630756207]
Research shows that grammatical mistakes in a sentence can be corrected by translating it to another language and back. Current generative models for Automatic Program Repair (APR) are pre-trained on source code and fine-tuned for repair. This paper proposes bypassing the fine-tuning step and using Round-Trip Translation (RTT): translation of code from one programming language to another programming or natural language, and back.
arXiv Detail & Related papers (2024-01-15T22:36:31Z)
RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair [8.321263361036808]
We propose RepairLLaMA, a novel program repair approach that identifies optimal code representations for APR with fine-tuned models.<n>This results in a highly effective program repair adapter' for fixing bugs with AI.<n>Overall, RepairLLaMA correctly fixes 144 Defects4J v2, 109 HumanEval-Java, and 20 GitBug-Java bugs.
arXiv Detail & Related papers (2023-12-25T11:39:46Z)
Lyra: Orchestrating Dual Correction in Automated Theorem Proving [63.115422781158934]
Lyra is a new framework that employs two distinct correction mechanisms: Tool Correction and Conjecture Correction. Tool Correction contributes to mitigating hallucinations, thereby improving the overall accuracy of the proof. Conjecture Correction refines generation with instruction but does not collect paired (generation, error & refinement) prompts.
arXiv Detail & Related papers (2023-09-27T17:29:41Z)
RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic Program Repair [75.40584530380589]
We propose a novel Retrieval-Augmented Patch Generation framework (RAP-Gen) RAP-Gen explicitly leveraging relevant fix patterns retrieved from a list of previous bug-fix pairs. We evaluate RAP-Gen on three benchmarks in two programming languages, including the TFix benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java.
arXiv Detail & Related papers (2023-09-12T08:52:56Z)
Break-It-Fix-It: Unsupervised Learning for Program Repair [90.55497679266442]
We propose a new training approach, Break-It-Fix-It (BIFI), which has two key ideas. We use the critic to check a fixer's output on real bad inputs and add good (fixed) outputs to the training data. Based on these ideas, we iteratively update the breaker and the fixer while using them in conjunction to generate more paired data. BIFI outperforms existing methods, obtaining 90.5% repair accuracy on GitHub-Python and 71.7% on DeepFix.
arXiv Detail & Related papers (2021-06-11T20:31:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.