Related papers: Repair Is Nearly Generation: Multilingual Program Repair with LLMs

Repair Is Nearly Generation: Multilingual Program Repair with LLMs

URL: http://arxiv.org/abs/2208.11640v1
Date: Wed, 24 Aug 2022 16:25:58 GMT
Title: Repair Is Nearly Generation: Multilingual Program Repair with LLMs
Authors: Harshit Joshi, Jos\'e Cambronero, Sumit Gulwani, Vu Le, Ivan Radicek, Gust Verbruggen
Abstract summary: We introduce RING, a multilingual repair engine powered by a large language model trained on code (LLMC) such as Codex. Taking inspiration from the way programmers manually fix bugs, we show that a prompt-based strategy that conceptualizes repair as localization, transformation, and candidate ranking, can successfully repair programs in multiple domains with minimal effort.
Score: 9.610685299268825
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Most programmers make mistakes when writing code. Some of these mistakes are small and require few edits to the original program - a class of errors recently termed last mile mistakes. These errors break the flow for experienced developers and can stump novice programmers. Existing automated repair techniques targeting this class of errors are domain-specific and do not easily carry over to new domains. Transferring symbolic approaches requires substantial engineering and neural approaches require data and retraining. We introduce RING, a multilingual repair engine powered by a large language model trained on code (LLMC) such as Codex. Such a multilingual engine enables a flipped model for programming assistance, one where the programmer writes code and the AI assistance suggests fixes, compared to traditional code suggestion technology. Taking inspiration from the way programmers manually fix bugs, we show that a prompt-based strategy that conceptualizes repair as localization, transformation, and candidate ranking, can successfully repair programs in multiple domains with minimal effort. We present the first results for such a multilingual repair engine by evaluating on 6 different domains and comparing performance to domain-specific repair engines. We show that RING can outperform domain-specific repair engines in 3 of these domains. We also identify directions for future research using LLMCs for multilingual repair.

Related papers

Do AI models help produce verified bug fixes? [62.985237003585674]
Large Language Models are used to produce corrections to software bugs.<n>This paper investigates how programmers use Large Language Models to complement their own skills.<n>The results are a first step towards a proper role for AI and LLMs in providing guaranteed-correct fixes to program bugs.
arXiv Detail & Related papers (2025-07-21T17:30:16Z)
Specification-Guided Repair of Arithmetic Errors in Dafny Programs using LLMs [84.30534714651093]
We present an innovative APR tool for Dafny, a verification-aware programming language.<n>We localize faults through a series of steps, which include using Hoare Logic to determine the state of each statement within the program.<n>We evaluate our approach using DafnyBench, a benchmark of real-world Dafny programs.
arXiv Detail & Related papers (2025-07-04T15:36:12Z)
Repairs in a Block World: A New Benchmark for Handling User Corrections with Multi-Modal Language Models [48.42142115255159]
We release BlockWorld-Repairs: a dataset of multi-modal TPR sequences in an instruction-following manipulation task. We evaluate several state-of-the-art Vision and Language Models (VLM) across multiple settings, focusing on their capability to process and accurately respond to TPRs. Our results suggest that these models are not yet ready to be deployed in multi-modal collaborative settings.
arXiv Detail & Related papers (2024-09-21T21:06:25Z)
Revisiting Evolutionary Program Repair via Code Language Model [11.711739409758476]
This paper introduces ARJA-CLM, which integrates the multiobjective evolutionary algorithm with CLM to fix multilocation bugs in Java projects. We also propose a context-aware prompt construction stratege, which enriches the prompt with additional information about accessible fields and methods for the CLM generating candidate statements.
arXiv Detail & Related papers (2024-08-20T01:57:45Z)
Investigating the Transferability of Code Repair for Low-Resource Programming Languages [57.62712191540067]
Large language models (LLMs) have shown remarkable performance on code generation tasks. Recent works augment the code repair process by integrating modern techniques such as chain-of-thought reasoning or distillation. We investigate the benefits of distilling code repair for both high and low resource languages.
arXiv Detail & Related papers (2024-06-21T05:05:39Z)
A Novel Approach for Automatic Program Repair using Round-Trip Translation with Large Language Models [50.86686630756207]
Research shows that grammatical mistakes in a sentence can be corrected by translating it to another language and back. Current generative models for Automatic Program Repair (APR) are pre-trained on source code and fine-tuned for repair. This paper proposes bypassing the fine-tuning step and using Round-Trip Translation (RTT): translation of code from one programming language to another programming or natural language, and back.
arXiv Detail & Related papers (2024-01-15T22:36:31Z)
RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair [8.321263361036808]
We propose RepairLLaMA, a novel program repair approach that identifies optimal code representations for APR with fine-tuned models. This results in a highly effective program repair adapter' for fixing bugs with AI. Overall, RepairLLaMA correctly fixes 144 Defects4J v2 and 109 HumanEval-Java bugs, outperforming all baselines.
arXiv Detail & Related papers (2023-12-25T11:39:46Z)
RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic Program Repair [75.40584530380589]
We propose a novel Retrieval-Augmented Patch Generation framework (RAP-Gen) RAP-Gen explicitly leveraging relevant fix patterns retrieved from a list of previous bug-fix pairs. We evaluate RAP-Gen on three benchmarks in two programming languages, including the TFix benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java.
arXiv Detail & Related papers (2023-09-12T08:52:56Z)
Repairing Bugs in Python Assignments Using Large Language Models [9.973714032271708]
We propose to use a large language model trained on code to build an APR system for programming assignments. Our system can fix both syntactic and semantic mistakes by combining multi-modal prompts, iterative querying, test-case-based selection of few-shots, and program chunking. We evaluate MMAPR on 286 real student programs and compare to a baseline built by combining a state-of-the-art Python syntax repair engine, BIFI, and state-of-the-art Python semantic repair engine for student assignments, Refactory.
arXiv Detail & Related papers (2022-09-29T15:41:17Z)
Neurosymbolic Repair for Low-Code Formula Languages [12.986749944196402]
Most users of low-code platforms, such as Excel and PowerApps, write programs in domain-specific formula languages. We develop LaMirage, a LAst-MIle RepAir-engine GEnerator that combines symbolic and neural techniques to perform last-mile repair. We compare LaMirage to state-of-the-art neural and symbolic approaches on 400 real Excel and PowerFx formulas.
arXiv Detail & Related papers (2022-07-24T15:56:03Z)
BigIssue: A Realistic Bug Localization Benchmark [89.8240118116093]
BigIssue is a benchmark for realistic bug localization. We provide a general benchmark with a diversity of real and synthetic Java bugs. We hope to advance the state of the art in bug localization, in turn improving APR performance and increasing its applicability to the modern development cycle.
arXiv Detail & Related papers (2022-07-21T20:17:53Z)
Graph-based, Self-Supervised Program Repair from Diagnostic Feedback [108.48853808418725]
We introduce a program-feedback graph, which connects symbols relevant to program repair in source code and diagnostic feedback. We then apply a graph neural network on top to model the reasoning process. We present a self-supervised learning paradigm for program repair that leverages unlabeled programs available online.
arXiv Detail & Related papers (2020-05-20T07:24:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.