Related papers: The Right Prompts for the Job: Repair Code-Review Defects with Large Language Model

The Right Prompts for the Job: Repair Code-Review Defects with Large Language Model

URL: http://arxiv.org/abs/2312.17485v1
Date: Fri, 29 Dec 2023 06:12:15 GMT
Title: The Right Prompts for the Job: Repair Code-Review Defects with Large Language Model
Authors: Zelin Zhao, Zhaogui Xu, Jialong Zhu, Peng Di, Yuan Yao, Xiaoxing Ma
Abstract summary: Automatic program repair (APR) techniques have the potential to reduce manual efforts in uncovering and repairing program defects during the code review (CR) process. However, the limited accuracy and considerable time costs associated with existing APR approaches hinder their adoption in industrial practice. Recent advancements in Large Language Models (LLMs) have enhanced their ability to comprehend natural and programming languages, enabling them to generate patches based on review comments.
Score: 15.885824575879763
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Automatic program repair (APR) techniques have the potential to reduce manual efforts in uncovering and repairing program defects during the code review (CR) process. However, the limited accuracy and considerable time costs associated with existing APR approaches hinder their adoption in industrial practice. One key factor is the under-utilization of review comments, which provide valuable insights into defects and potential fixes. Recent advancements in Large Language Models (LLMs) have enhanced their ability to comprehend natural and programming languages, enabling them to generate patches based on review comments. This paper conducts a comprehensive investigation into the effective utilization of LLMs for repairing CR defects. In this study, various prompts are designed and compared across mainstream LLMs using two distinct datasets from human reviewers and automated checkers. Experimental results demonstrate a remarkable repair rate of 72.97% with the best prompt, highlighting a substantial improvement in the effectiveness and practicality of automatic repair techniques.

Related papers

FeedbackEval: A Benchmark for Evaluating Large Language Models in Feedback-Driven Code Repair Tasks [28.849481030601666]
We introduce FeedbackEval, a benchmark for evaluating large language models' feedback comprehension and performance. We conduct a comprehensive empirical study on five state-of-the-art LLMs, including GPT-4o, Claude-3.5, Gemini-1.5, GLM-4, and Qwen2.5. Our results show that structured feedback, particularly in the form of test feedback, leads to the highest repair success rates, while unstructured feedback proves significantly less effective.
arXiv Detail & Related papers (2025-04-09T14:43:08Z)
Can LLMs Automate Fact-Checking Article Writing? [69.90165567819656]
We argue for the need to extend the typical automatic fact-checking pipeline with automatic generation of full fact-checking articles. We develop QRAFT, an LLM-based agentic framework that mimics the writing workflow of human fact-checkers.
arXiv Detail & Related papers (2025-03-22T07:56:50Z)
Studying and Understanding the Effectiveness and Failures of Conversational LLM-Based Repair [3.93048798243871]
Automated program repair (APR) is designed to automate the process of bug-fixing. Advanced APR techniques powered by conversational language models (LLMs) have exhibited impressive repair abilities. Despite the superiority, conversational APR techniques still fail to repair a large number of bugs.
arXiv Detail & Related papers (2025-03-19T09:39:32Z)
ReVISE: Learning to Refine at Test-Time via Intrinsic Self-Verification [53.80183105328448]
Refine via Intrinsic Self-Verification (ReVISE) is an efficient framework that enables LLMs to self-correct their outputs through self-verification. Our experiments on various reasoning tasks demonstrate that ReVISE achieves efficient self-correction and significantly improves reasoning performance.
arXiv Detail & Related papers (2025-02-20T13:50:02Z)
Enabling Scalable Oversight via Self-Evolving Critic [59.861013614500024]
SCRIT (Self-evolving CRITic) is a framework that enables genuine self-evolution of critique abilities. It self-improves by training on synthetic data, generated by a contrastive-based self-critic. It achieves up to a 10.3% improvement on critique-correction and error identification benchmarks.
arXiv Detail & Related papers (2025-01-10T05:51:52Z)
FastFixer: An Efficient and Effective Approach for Repairing Programming Assignments [21.848112758958543]
We propose FastFixer, an efficient and effective approach for programming assignment repair. We first propose a novel repair-oriented fine-tuning strategy, aiming to enhance the LLM's attention towards learning how to generate the necessary patch and its associated context. Considering the repair efficiency, FastFixer achieves a remarkable inference speedup of 16.67 times compared to the autoregressive decoding algorithm.
arXiv Detail & Related papers (2024-10-11T10:17:02Z)
Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification [52.095460362197336]
Large language models (LLMs) struggle with consistent and accurate reasoning. LLMs are trained primarily on correct solutions, reducing their ability to detect and learn from errors. We propose a novel collaborative method integrating Chain-of-Thought (CoT) and Program-of-Thought (PoT) solutions for verification.
arXiv Detail & Related papers (2024-10-05T05:21:48Z)
Learning to Refine with Fine-Grained Natural Language Feedback [81.70313509881315]
We propose looking at refinement with feedback as a composition of three distinct LLM competencies. A key property of the proposed Detect, Critique, Refine ("DCR") method is that the step 2 critique model can give fine-grained feedback about errors. We show that models of different capabilities benefit from refining with DCR on the task of improving factual consistency of document grounded summaries.
arXiv Detail & Related papers (2024-07-02T16:15:01Z)
On The Effectiveness of Dynamic Reduction Techniques in Automated Program Repair [1.7767466724342067]
We describe a program repair framework that effectively handles large-scale buggy programs of industrial complexity. The framework exploits program reduction in the form of program slicing to eliminate parts of the code irrelevant to the bug being repaired. Our empirical results on the widely used Defects4J dataset reveal that a substantial improvement in performance can be obtained without any degradation in repair quality.
arXiv Detail & Related papers (2024-06-23T21:35:07Z)
Investigating the Transferability of Code Repair for Low-Resource Programming Languages [57.62712191540067]
Large language models (LLMs) have shown remarkable performance on code generation tasks. Recent works augment the code repair process by integrating modern techniques such as chain-of-thought reasoning or distillation. We investigate the benefits of distilling code repair for both high and low resource languages.
arXiv Detail & Related papers (2024-06-21T05:05:39Z)
A Case Study of LLM for Automated Vulnerability Repair: Assessing Impact of Reasoning and Patch Validation Feedback [7.742213291781287]
We present VRpilot, a vulnerability repair technique based on reasoning and patch validation feedback. Our results show that VRpilot generates, on average, 14% and 7.6% more correct patches than the baseline techniques on C and Java.
arXiv Detail & Related papers (2024-05-24T16:29:48Z)
How Far Can We Go with Practical Function-Level Program Repair? [11.71750828464698]
This paper investigates the effect of few-shot learning mechanism and the auxiliary repair-relevant information on function-level APR. We propose an LLM-based function-level APR technique, namely SRepair, which adopts a dual-LLM framework to leverage the power of the auxiliary repair-relevant information.
arXiv Detail & Related papers (2024-04-19T12:14:09Z)
Towards Coarse-to-Fine Evaluation of Inference Efficiency for Large Language Models [95.96734086126469]
Large language models (LLMs) can serve as the assistant to help users accomplish their jobs, and also support the development of advanced applications. For the wide application of LLMs, the inference efficiency is an essential concern, which has been widely studied in existing work. We perform a detailed coarse-to-fine analysis of the inference performance of various code libraries.
arXiv Detail & Related papers (2024-04-17T15:57:50Z)
An Empirical Evaluation of Pre-trained Large Language Models for Repairing Declarative Formal Specifications [5.395614997568524]
This paper presents a systematic investigation into the capacity of Large Language Models (LLMs) for repairing declarative specifications in Alloy. We propose a novel repair pipeline that integrates a dual-agent LLM framework, comprising a Repair Agent and a Prompt Agent. Our study reveals that LLMs, particularly GPT-4 variants, outperform existing techniques in terms of repair efficacy, albeit with a marginal increase in runtime and token usage.
arXiv Detail & Related papers (2024-04-17T03:46:38Z)
Large Language Models Cannot Self-Correct Reasoning Yet [78.16697476530994]
Large Language Models (LLMs) have emerged as a groundbreaking technology with their unparalleled text generation capabilities. Concerns persist regarding the accuracy and appropriateness of their generated content. A contemporary methodology, self-correction, has been proposed as a remedy to these issues.
arXiv Detail & Related papers (2023-10-03T04:56:12Z)
Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies [104.32199881187607]
Large language models (LLMs) have demonstrated remarkable performance across a wide array of NLP tasks. A promising approach to rectify these flaws is self-correction, where the LLM itself is prompted or guided to fix problems in its own output. This paper presents a comprehensive review of this emerging class of techniques.
arXiv Detail & Related papers (2023-08-06T18:38:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.