Related papers: Specification Vibing for Automated Program Repair

Specification Vibing for Automated Program Repair

URL: http://arxiv.org/abs/2602.08263v1
Date: Mon, 09 Feb 2026 04:44:58 GMT
Title: Specification Vibing for Automated Program Repair
Authors: Taohong Zhu, Lucas C. Cordeiro, Mustafa A. Mustafa, Youcheng Sun,
Abstract summary: VibeRepair is a specification-centric APR technique that treats repair as behavior-specification repair rather than ad-hoc code editing.<n>On Defects4J v1.2, VibeRepair correctly repairs 174 bugs, exceeding the strongest state-of-the-art baseline by 28 bugs.<n>On Defects4J v2.0, it repairs 178 bugs, outperforming prior approaches by 33 bugs, representing a 23% improvement.
Score: 8.68148153927532
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language model (LLM)-driven automated program repair (APR) has advanced rapidly, but most methods remain code-centric: they directly rewrite source code and thereby risk hallucinated, behaviorally inconsistent fixes. This limitation suggests the need for an alternative repair paradigm that relies on a representation more accessible to LLMs than raw code, enabling more accurate understanding, analysis, and alignment during repair. To address this gap, we propose VibeRepair, a specification-centric APR technique that treats repair as behavior-specification repair rather than ad-hoc code editing. VibeRepair first translates buggy code into a structured behavior specification that captures the program's intended runtime behavior, then infers and repairs specification misalignments, and finally synthesizes code strictly guided by the corrected behavior specification. An on-demand reasoning component enriches hard cases with program analysis and historical bug-fix evidence while controlling cost. Across Defects4J and real-world benchmarks and multiple LLMs, VibeRepair demonstrates consistently strong repair effectiveness with a significantly smaller patch space. On Defects4J v1.2, VibeRepair correctly repairs 174 bugs, exceeding the strongest state-of-the-art baseline by 28 bugs, which corresponds to a 19% improvement. On Defects4J v2.0, it repairs 178 bugs, outperforming prior approaches by 33 bugs, representing a 23% improvement. Evaluations on real-world benchmarks collected after the training period of selected LLMs further confirm its effectiveness and generalizability. By centering repair on explicit behavioral intent, VibeRepair reframes APR for the era of "vibe" coding: make the behavior sing, and the code will follow.

Related papers

RelRepair: Enhancing Automated Program Repair by Retrieving Relevant Code [11.74568238259256]
RelRepair retrieves relevant project-specific code to enhance automated program repair.<n>We evaluate RelRepair on two widely studied datasets, Defects4J V1.2 and ManySStuBs4J.
arXiv Detail & Related papers (2025-09-20T14:07:28Z)
Do AI models help produce verified bug fixes? [62.985237003585674]
Large Language Models are used to produce corrections to software bugs.<n>This paper investigates how programmers use Large Language Models to complement their own skills.<n>The results are a first step towards a proper role for AI and LLMs in providing guaranteed-correct fixes to program bugs.
arXiv Detail & Related papers (2025-07-21T17:30:16Z)
Specification-Guided Repair of Arithmetic Errors in Dafny Programs using LLMs [79.74676890436174]
We present an APR tool for Dafny that uses formal specifications as oracles for fault localization and repair.<n>We localize faults through a series of steps, which include using Hoare logic to determine the state of each statement within the program.<n>Our tool achieves 89.6% fault localization coverage and GPT-4o mini yields the highest repair success rate of 74.18%.
arXiv Detail & Related papers (2025-07-04T15:36:12Z)
Repair Ingredients Are All You Need: Improving Large Language Model-Based Program Repair via Repair Ingredients Search [41.50068103527948]
We propose ReinFix, a framework that searches for repair ingredients throughout the reasoning and solution phases of bug fixing.<n>During the solution phase, ReinFix searches for external ingredients from historical bug fixes with similar bug patterns.<n> Evaluations on two popular benchmarks demonstrate the effectiveness of our approach over SOTA baselines.
arXiv Detail & Related papers (2025-06-29T06:02:11Z)
Unblocking Fine-Grained Evaluation of Detailed Captions: An Explaining AutoRater and Critic-and-Revise Pipeline [58.832237984587664]
We develop VNLI-Critique, a model for automated sentence-level factuality classification and critique generation.<n>We highlight three key applications: (1) VNLI-Critique demonstrates robust generalization, validated by state-of-the-art performance on the M-HalDetect benchmark; (2) The VNLI-Critique driven AutoRater for DOCCI-Critique provides reliable VLM rankings, showing excellent alignment with human factuality judgments; and (3) An innovative Critic-and-Revise pipeline, achieves substantial improvements in caption factuality.
arXiv Detail & Related papers (2025-06-09T10:57:26Z)
ThinkRepair: Self-Directed Automated Program Repair [11.598008952093487]
Large language models (LLMs) instructed by prompt engineering have attracted much attention for their powerful ability to address many kinds of tasks including bug-fixing. We propose a self-directed LLM-based automated program repair, ThinkRepair, with two main phases: collection phase and fixing phase. Evaluations on two widely studied datasets (Defects4J and QuixBugs) by comparing ThinkRepair with 12 SOTA APRs indicate the priority of ThinkRepair in fixing bugs.
arXiv Detail & Related papers (2024-07-30T15:17:07Z)
Investigating the Transferability of Code Repair for Low-Resource Programming Languages [57.62712191540067]
Large language models (LLMs) have shown remarkable performance on code generation tasks. Recent works augment the code repair process by integrating modern techniques such as chain-of-thought reasoning or distillation. We investigate the benefits of distilling code repair for both high and low resource languages.
arXiv Detail & Related papers (2024-06-21T05:05:39Z)
ContrastRepair: Enhancing Conversation-Based Automated Program Repair via Contrastive Test Case Pairs [23.419180504723546]
ContrastRepair is a novel APR approach that augments conversation-driven APR by providing contrastive test pairs. We evaluate ContrastRepair on multiple benchmark datasets, including Defects4j, QuixBugs, and HumanEval-Java.
arXiv Detail & Related papers (2024-03-04T12:15:28Z)
Assessing the Latent Automated Program Repair Capabilities of Large Language Models using Round-Trip Translation [44.3761164214368]
We investigate Round-Trip Translation (RTT): translating code from one programming language into another programming or natural language and back.<n>We perform a detailed quantitative and qualitative analysis of RTT-generated patches in Java.<n>We find that RTT through English generates plausible patches for 100 of 164 bugs with GPT-4 on the HumanEval-Java benchmark, and 97 are found to be correct in our manual assessment.
arXiv Detail & Related papers (2024-01-15T22:36:31Z)
RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair [6.8144965833113815]
We propose RepairLLaMA, a novel program repair approach that identifies optimal code representations for APR with fine-tuned models.<n>This results in a highly effective program repair adapter' for fixing bugs with AI.<n>Overall, RepairLLaMA correctly fixes 144 Defects4J v2, 109 HumanEval-Java, and 20 GitBug-Java bugs.
arXiv Detail & Related papers (2023-12-25T11:39:46Z)
RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic Program Repair [75.40584530380589]
We propose a novel Retrieval-Augmented Patch Generation framework (RAP-Gen) RAP-Gen explicitly leveraging relevant fix patterns retrieved from a list of previous bug-fix pairs. We evaluate RAP-Gen on three benchmarks in two programming languages, including the TFix benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java.
arXiv Detail & Related papers (2023-09-12T08:52:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.