Related papers: Revisiting the Plastic Surgery Hypothesis via Large Language Models

Revisiting the Plastic Surgery Hypothesis via Large Language Models

URL: http://arxiv.org/abs/2303.10494v1
Date: Sat, 18 Mar 2023 20:33:46 GMT
Title: Revisiting the Plastic Surgery Hypothesis via Large Language Models
Authors: Chunqiu Steven Xia, Yifeng Ding, Lingming Zhang
Abstract summary: We propose FitRepair, which combines the direct usage of Large Language Models with two domain-specific fine-tuning strategies and one prompting strategy for more powerful APR. Our experiments on the widely studied Defects4j 1.2 and 2.0 datasets show that FitRepair fixes 89 and 44 bugs.
Score: 9.904030364454563
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Automated Program Repair (APR) aspires to automatically generate patches for an input buggy program. Traditional APR tools typically focus on specific bug types and fixes through the use of templates, heuristics, and formal specifications. However, these techniques are limited in terms of the bug types and patch variety they can produce. As such, researchers have designed various learning-based APR tools with recent work focused on directly using Large Language Models (LLMs) for APR. While LLM-based APR tools are able to achieve state-of-the-art performance on many repair datasets, the LLMs used for direct repair are not fully aware of the project-specific information such as unique variable or method names. The plastic surgery hypothesis is a well-known insight for APR, which states that the code ingredients to fix the bug usually already exist within the same project. Traditional APR tools have largely leveraged the plastic surgery hypothesis by designing manual or heuristic-based approaches to exploit such existing code ingredients. However, as recent APR research starts focusing on LLM-based approaches, the plastic surgery hypothesis has been largely ignored. In this paper, we ask the following question: How useful is the plastic surgery hypothesis in the era of LLMs? Interestingly, LLM-based APR presents a unique opportunity to fully automate the plastic surgery hypothesis via fine-tuning and prompting. To this end, we propose FitRepair, which combines the direct usage of LLMs with two domain-specific fine-tuning strategies and one prompting strategy for more powerful APR. Our experiments on the widely studied Defects4j 1.2 and 2.0 datasets show that FitRepair fixes 89 and 44 bugs (substantially outperforming the best-performing baseline by 15 and 8), respectively, demonstrating a promising future of the plastic surgery hypothesis in the era of LLMs.

Related papers

Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing [63.20133320524577]
Large Language Models (LLMs) have demonstrated great potential as generalist assistants. It is crucial that these models exhibit desirable behavioral traits, such as non-toxicity and resilience against jailbreak attempts. In this paper, we observe that directly editing a small subset of parameters can effectively modulate specific behaviors of LLMs.
arXiv Detail & Related papers (2024-07-11T17:52:03Z)
Hybrid Automated Program Repair by Combining Large Language Models and Program Analysis [12.7034916462208]
Automated Program Repair (APR) has garnered significant attention due to its potential to streamline the bug repair process for human developers. This paper introduces an innovative APR approach called GIANTREPAIR. Based on this insight, GIANTREPAIR first constructs patch skeletons from LLM-generated patches to confine the patch space, and then generates high-quality patches tailored to specific programs.
arXiv Detail & Related papers (2024-06-03T05:05:12Z)
Revisiting Unnaturalness for Automated Program Repair in the Era of Large Language Models [9.454475517867817]
We propose a patch-naturalness measurement, entropy-delta, to improve the efficiency of template-based repair techniques. Our proposed method can rank correct patches more effectively than state-of-the-art machine learning tools.
arXiv Detail & Related papers (2024-04-23T17:12:45Z)
Aligning LLMs for FL-free Program Repair [14.935596175148586]
This paper investigates a new approach to adapt large language models (LLMs) to program repair. Our core insight is that LLM's APR capability can be greatly improved by simply aligning the output to their training objective. Based on this insight, we designed D4C, a straightforward prompting framework for APR.
arXiv Detail & Related papers (2024-04-13T02:36:40Z)
LLM-Assisted Multi-Teacher Continual Learning for Visual Question Answering in Robotic Surgery [57.358568111574314]
Patient data privacy often restricts the availability of old data when updating the model. Prior CL studies overlooked two vital problems in the surgical domain. This paper proposes addressing these problems with a multimodal large language model (LLM) and an adaptive weight assignment methodology.
arXiv Detail & Related papers (2024-02-26T15:35:24Z)
Large Language Model Distilling Medication Recommendation Model [61.89754499292561]
We harness the powerful semantic comprehension and input-agnostic characteristics of Large Language Models (LLMs) Our research aims to transform existing medication recommendation methodologies using LLMs. To mitigate this, we have developed a feature-level knowledge distillation technique, which transfers the LLM's proficiency to a more compact model.
arXiv Detail & Related papers (2024-02-05T08:25:22Z)
Are Large Language Models Good Prompt Optimizers? [65.48910201816223]
We conduct a study to uncover the actual mechanism of LLM-based Prompt Optimization. Our findings reveal that the LLMs struggle to identify the true causes of errors during reflection, tending to be biased by their own prior knowledge. We introduce a new "Automatic Behavior Optimization" paradigm, which directly optimize the target model's behavior in a more controllable manner.
arXiv Detail & Related papers (2024-02-03T09:48:54Z)
A Novel Approach for Automatic Program Repair using Round-Trip Translation with Large Language Models [50.86686630756207]
Research shows that grammatical mistakes in a sentence can be corrected by translating it to another language and back. Current generative models for Automatic Program Repair (APR) are pre-trained on source code and fine-tuned for repair. This paper proposes bypassing the fine-tuning step and using Round-Trip Translation (RTT): translation of code from one programming language to another programming or natural language, and back.
arXiv Detail & Related papers (2024-01-15T22:36:31Z)
Evaluating Pre-trained Language Models for Repairing API Misuses [15.17607624946389]
API misuses often lead to software bugs, crashes, and vulnerabilities. In a recent study, test-suite-based automatic program repair (APR) tools were found to be ineffective in repairing API misuses. We conduct a comprehensive empirical study on 11 learning-aided APR tools, which include 9 of the state-of-the-art general-purpose PLMs and two APR tools. Our results show that PLMs perform better than the studied APR tools in repairing API misuses.
arXiv Detail & Related papers (2023-10-25T06:10:22Z)
Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies [104.32199881187607]
Large language models (LLMs) have demonstrated remarkable performance across a wide array of NLP tasks. A promising approach to rectify these flaws is self-correction, where the LLM itself is prompted or guided to fix problems in its own output. This paper presents a comprehensive review of this emerging class of techniques.
arXiv Detail & Related papers (2023-08-06T18:38:52Z)
Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback [127.75419038610455]
Large language models (LLMs) are able to generate human-like, fluent responses for many downstream tasks. This paper proposes a LLM-Augmenter system, which augments a black-box LLM with a set of plug-and-play modules.
arXiv Detail & Related papers (2023-02-24T18:48:43Z)
Conversational Automated Program Repair [10.071615423169902]
We propose a new paradigm for program repair that alternates between patch generation and validation in a conversational manner. We leverage the long-term context window of Large Pre-Trained Language Models to not only avoid generating previously incorrect patches but also incorporate validation feedback to help the model understand the semantic meaning of the program under test.
arXiv Detail & Related papers (2023-01-30T19:22:36Z)
Practical Program Repair in the Era of Large Pre-trained Language Models [13.694803023685175]
Automated Program Repair (APR) aims to help developers automatically patch software bugs. PLMs, trained using billions of text/code tokens, can potentially help avoid this issue. We select 9 recent state-of-the-art PLMs, including both generative and infilling models, ranging from 125M to 20B in size.
arXiv Detail & Related papers (2022-10-25T17:18:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.