ContrastRepair: Enhancing Conversation-Based Automated Program Repair
via Contrastive Test Case Pairs
- URL: http://arxiv.org/abs/2403.01971v2
- Date: Thu, 7 Mar 2024 05:33:36 GMT
- Title: ContrastRepair: Enhancing Conversation-Based Automated Program Repair
via Contrastive Test Case Pairs
- Authors: Jiaolong Kong, Mingfei Cheng, Xiaofei Xie, Shangqing Liu, Xiaoning Du,
Qi Guo
- Abstract summary: ContrastRepair is a novel APR approach that augments conversation-driven APR by providing contrastive test pairs.
We evaluate ContrastRepair on multiple benchmark datasets, including Defects4j, QuixBugs, and HumanEval-Java.
- Score: 23.419180504723546
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automated Program Repair (APR) aims to automatically generate patches for
rectifying software bugs. Recent strides in Large Language Models (LLM), such
as ChatGPT, have yielded encouraging outcomes in APR, especially within the
conversation-driven APR framework. Nevertheless, the efficacy of
conversation-driven APR is contingent on the quality of the feedback
information. In this paper, we propose ContrastRepair, a novel
conversation-based APR approach that augments conversation-driven APR by
providing LLMs with contrastive test pairs. A test pair consists of a failing
test and a passing test, which offer contrastive feedback to the LLM. Our key
insight is to minimize the difference between the generated passing test and
the given failing test, which can better isolate the root causes of bugs. By
providing informative and specific feedback, ContrastRepair enables the LLM to
produce effective bug fixes. The implementation of ContrastRepair is based on
the state-of-the-art LLM, ChatGPT, and it iteratively interacts with ChatGPT
until plausible patches are generated. We evaluate ContrastRepair on multiple
benchmark datasets, including Defects4j, QuixBugs, and HumanEval-Java. The
results demonstrate that ContrastRepair significantly outperforms existing
methods, achieving a new state-of-the-art in program repair. For instance,
among Defects4j 1.2 and 2.0, ContrastRepair correctly repairs 143 out of all
337 bug cases, while the best-performing baseline fixes 124 bugs.
Related papers
- Towards Practical and Useful Automated Program Repair for Debugging [4.216808129651161]
PracAPR is an interactive repair system that works in an Integrated Development Environment (IDE)
PracAPR does not require a test suite or program re-execution.
arXiv Detail & Related papers (2024-07-12T03:19:54Z) - Error Correction by Paying Attention to Both Acoustic and Confidence References for Automatic Speech Recognition [52.624909026294105]
We propose a non-autoregressive speech error correction method.
A Confidence Module measures the uncertainty of each word of the N-best ASR hypotheses.
The proposed system reduces the error rate by 21% compared with the ASR model.
arXiv Detail & Related papers (2024-06-29T17:56:28Z) - Re-ReST: Reflection-Reinforced Self-Training for Language Agents [101.22559705696885]
Self-training in language agents can generate supervision from the agent itself.
We present Reflection-Reinforced Self-Training (Re-ReST), which uses a textitreflector to refine low-quality generated samples.
arXiv Detail & Related papers (2024-06-03T16:21:38Z) - Hybrid Automated Program Repair by Combining Large Language Models and Program Analysis [12.7034916462208]
Automated Program Repair (APR) has garnered significant attention due to its potential to streamline the bug repair process for human developers.
This paper introduces an innovative APR approach called GIANTREPAIR.
Based on this insight, GIANTREPAIR first constructs patch skeletons from LLM-generated patches to confine the patch space, and then generates high-quality patches tailored to specific programs.
arXiv Detail & Related papers (2024-06-03T05:05:12Z) - A Novel Approach for Automatic Program Repair using Round-Trip
Translation with Large Language Models [50.86686630756207]
Research shows that grammatical mistakes in a sentence can be corrected by translating it to another language and back.
Current generative models for Automatic Program Repair (APR) are pre-trained on source code and fine-tuned for repair.
This paper proposes bypassing the fine-tuning step and using Round-Trip Translation (RTT): translation of code from one programming language to another programming or natural language, and back.
arXiv Detail & Related papers (2024-01-15T22:36:31Z) - A Critical Review of Large Language Model on Software Engineering: An Example from ChatGPT and Automated Program Repair [19.123640635549524]
Large Language Models (LLMs) have been gaining increasing attention and demonstrated promising performance across a variety of software engineering tasks.
This paper reviews the bug-fixing capabilities of ChatGPT on a clean APR benchmark with different research objectives.
ChatGPT is able to fix 109 out of 151 buggy programs using the basic prompt within 35 independent rounds, outperforming state-of-the-art LLMs CodeT5 and PLBART by 27.5% and 62.4% prediction accuracy.
arXiv Detail & Related papers (2023-10-13T06:11:47Z) - RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic
Program Repair [75.40584530380589]
We propose a novel Retrieval-Augmented Patch Generation framework (RAP-Gen)
RAP-Gen explicitly leveraging relevant fix patterns retrieved from a list of previous bug-fix pairs.
We evaluate RAP-Gen on three benchmarks in two programming languages, including the TFix benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java.
arXiv Detail & Related papers (2023-09-12T08:52:56Z) - LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond [135.8013388183257]
We propose a new protocol for inconsistency detection benchmark creation and implement it in a 10-domain benchmark called SummEdits.
Most LLMs struggle on SummEdits, with performance close to random chance.
The best-performing model, GPT-4, is still 8% below estimated human performance.
arXiv Detail & Related papers (2023-05-23T21:50:06Z) - Teaching Large Language Models to Self-Debug [62.424077000154945]
Large language models (LLMs) have achieved impressive performance on code generation.
We propose Self- Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations.
arXiv Detail & Related papers (2023-04-11T10:43:43Z) - Keep the Conversation Going: Fixing 162 out of 337 bugs for $0.42 each
using ChatGPT [10.071615423169902]
Automated Program Repair (APR) aims to automatically generate patches for buggy programs.
Recent APR work has been focused on leveraging modern Large Language Models (LLMs) to directly generate patches for APR.
We propose ChatRepair, the first fully automated conversation-driven APR approach.
arXiv Detail & Related papers (2023-04-01T20:57:33Z) - Conversational Automated Program Repair [10.071615423169902]
We propose a new paradigm for program repair that alternates between patch generation and validation in a conversational manner.
We leverage the long-term context window of Large Pre-Trained Language Models to not only avoid generating previously incorrect patches but also incorporate validation feedback to help the model understand the semantic meaning of the program under test.
arXiv Detail & Related papers (2023-01-30T19:22:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.