Automated Bug Generation in the era of Large Language Models
- URL: http://arxiv.org/abs/2310.02407v1
- Date: Tue, 3 Oct 2023 20:01:51 GMT
- Title: Automated Bug Generation in the era of Large Language Models
- Authors: Ali Reza Ibrahimzada, Yang Chen, Ryan Rong, Reyhaneh Jabbarvand
- Abstract summary: We propose BugFarm, to transform arbitrary code into multiple complex bugs.
BugFarm generates bugs that are hard to detect by learning-based bug prediction approaches and hard to repair by SOTA learning-based program repair technique.
- Score: 6.519768481767584
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Bugs are essential in software engineering; many research studies in the past
decades have been proposed to detect, localize, and repair bugs in software
systems. Effectiveness evaluation of such techniques requires complex bugs,
i.e., those that are hard to detect through testing and hard to repair through
debugging. From the classic software engineering point of view, a
hard-to-repair bug differs from the correct code in multiple locations, making
it hard to localize and repair. Hard-to-detect bugs, on the other hand,
manifest themselves under specific test inputs and reachability conditions.
These two objectives, i.e., generating hard-to-detect and hard-to-repair bugs,
are mostly aligned; a bug generation technique can change multiple statements
to be covered only under a specific set of inputs. However, these two
objectives are conflicting for learning-based techniques: A bug should have a
similar code representation to the correct code in the training data to
challenge a bug prediction model to distinguish them. The hard-to-repair bug
definition remains the same but with a caveat: the more a bug differs from the
original code (at multiple locations), the more distant their representations
are and easier to be detected. We propose BugFarm, to transform arbitrary code
into multiple complex bugs. BugFarm leverages LLMs to mutate code in multiple
locations (hard-to-repair). To ensure that multiple modifications do not
notably change the code representation, BugFarm analyzes the attention of the
underlying model and instructs LLMs to only change the least attended locations
(hard-to-detect). Our comprehensive evaluation of 320k+ bugs from over 2.5M
mutants generated by BugFarm and two alternative approaches demonstrates our
superiority in generating bugs that are hard to detect by learning-based bug
prediction approaches and hard to repair by SOTA learning-based program repair
technique.
Related papers
- DebugBench: Evaluating Debugging Capability of Large Language Models [80.73121177868357]
DebugBench is a benchmark for Large Language Models (LLMs)
It covers four major bug categories and 18 minor types in C++, Java, and Python.
We evaluate two commercial and four open-source models in a zero-shot scenario.
arXiv Detail & Related papers (2024-01-09T15:46:38Z) - PreciseBugCollector: Extensible, Executable and Precise Bug-fix
Collection [8.79879909193717]
We introduce PreciseBugCollector, a precise, multi-language bug collection approach.
It is based on two novel components: a bug tracker to map the repositories with external bug repositories to trace bug type information, and a bug injector to generate project-specific bugs.
To date, PreciseBugCollector comprises 1057818 bugs extracted from 2968 open-source projects.
arXiv Detail & Related papers (2023-09-12T13:47:44Z) - RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic
Program Repair [75.40584530380589]
We propose a novel Retrieval-Augmented Patch Generation framework (RAP-Gen)
RAP-Gen explicitly leveraging relevant fix patterns retrieved from a list of previous bug-fix pairs.
We evaluate RAP-Gen on three benchmarks in two programming languages, including the TFix benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java.
arXiv Detail & Related papers (2023-09-12T08:52:56Z) - Large Language Models of Code Fail at Completing Code with Potential
Bugs [30.80172644795715]
We study the buggy-code completion problem inspired by real-time code suggestion.
We find that the presence of potential bugs significantly degrades the generation performance of the high-performing Code-LLMs.
arXiv Detail & Related papers (2023-06-06T06:35:27Z) - WELL: Applying Bug Detectors to Bug Localization via Weakly Supervised
Learning [37.09621161662761]
This paper proposes a WEakly supervised bug LocaLization (WELL) method to train a bug localization model.
With CodeBERT finetuned on the buggy-or-not binary labeled data, WELL can address bug localization in a weakly supervised manner.
arXiv Detail & Related papers (2023-05-27T06:34:26Z) - Teaching Large Language Models to Self-Debug [62.424077000154945]
Large language models (LLMs) have achieved impressive performance on code generation.
We propose Self- Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations.
arXiv Detail & Related papers (2023-04-11T10:43:43Z) - Using Developer Discussions to Guide Fixing Bugs in Software [51.00904399653609]
We propose using bug report discussions, which are available before the task is performed and are also naturally occurring, avoiding the need for additional information from developers.
We demonstrate that various forms of natural language context derived from such discussions can aid bug-fixing, even leading to improved performance over using commit messages corresponding to the oracle bug-fixing commits.
arXiv Detail & Related papers (2022-11-11T16:37:33Z) - BigIssue: A Realistic Bug Localization Benchmark [89.8240118116093]
BigIssue is a benchmark for realistic bug localization.
We provide a general benchmark with a diversity of real and synthetic Java bugs.
We hope to advance the state of the art in bug localization, in turn improving APR performance and increasing its applicability to the modern development cycle.
arXiv Detail & Related papers (2022-07-21T20:17:53Z) - DapStep: Deep Assignee Prediction for Stack Trace Error rePresentation [61.99379022383108]
We propose new deep learning models to solve the bug triage problem.
The models are based on a bidirectional recurrent neural network with attention and on a convolutional neural network.
To improve the quality of ranking, we propose using additional information from version control system annotations.
arXiv Detail & Related papers (2022-01-14T00:16:57Z) - Self-Supervised Bug Detection and Repair [27.46717890823656]
We present BugLab, an approach for self-supervised learning of bug detection and repair.
A Python implementation of BugLab improves by up to 30% upon baseline methods on a test dataset of 2374 real-life bugs.
arXiv Detail & Related papers (2021-05-26T18:41:05Z) - Advaita: Bug Duplicity Detection System [1.9624064951902522]
Duplicate bugs rate (% of duplicate bugs) are in the range from single digit (1 to 9%) to double digits (40%) based on the product maturity, size of the code and number of engineers working on the project.
Detecting duplicity deals with identifying whether any two bugs convey the same meaning.
This approach considers multiple sets of features viz. basic text statistical features, semantic features and contextual features.
arXiv Detail & Related papers (2020-01-24T04:48:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.