Can we learn from developer mistakes? Learning to localize and repair
real bugs from real bug fixes
- URL: http://arxiv.org/abs/2207.00301v1
- Date: Fri, 1 Jul 2022 09:49:17 GMT
- Title: Can we learn from developer mistakes? Learning to localize and repair
real bugs from real bug fixes
- Authors: Cedric Richter and Heike Wehrheim
- Abstract summary: We introduce RealiT, a pre-train-and-fine-tune approach for learning to localize and repair real bugs from real bug fixes.
We found that training on real bug fixes with RealiT is empirically powerful by nearly doubling the localization performance of an existing model on real bugs.
- Score: 0.5330240017302619
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Real bug fixes found in open source repositories seem to be the perfect
source for learning to localize and repair real bugs. However, the absence of
large scale bug fix collections has made it difficult to effectively exploit
real bug fixes in the training of larger neural models in the past. In
contrast, artificial bugs -- produced by mutating existing source code -- can
be easily obtained at a sufficient scale and are therefore often preferred in
the training of existing approaches. Still, localization and repair models that
are trained on artificial bugs usually underperform when faced with real bugs.
This raises the question whether bug localization and repair models trained on
real bug fixes are more effective in localizing and repairing real bugs.
We address this question by introducing RealiT, a pre-train-and-fine-tune
approach for effectively learning to localize and repair real bugs from real
bug fixes. RealiT is first pre-trained on a large number of artificial bugs
produced by traditional mutation operators and then fine-tuned on a smaller set
of real bug fixes. Fine-tuning does not require any modifications of the
learning algorithm and hence can be easily adopted in various training
scenarios for bug localization or repair (even when real training data is
scarce). In addition, we found that training on real bug fixes with RealiT is
empirically powerful by nearly doubling the localization performance of an
existing model on real bugs while maintaining or even improving the repair
performance.
Related papers
- Towards Understanding the Challenges of Bug Localization in Deep
Learning Systems [2.9312156642007294]
We conduct a large-scale empirical study to better understand the challenges of localizing bugs in deep-learning systems.
First, we determine the bug localization performance of four existing techniques using 2,365 bugs from deep-learning systems and 2,913 from traditional software.
Second, we evaluate how different bug types in deep learning systems impact bug localization.
arXiv Detail & Related papers (2024-02-01T21:17:42Z) - Automated Bug Generation in the era of Large Language Models [6.0770779409377775]
BugFarm transforms arbitrary code into multiple complex bugs.
A comprehensive evaluation of 435k+ bugs from over 1.9M mutants generated by BUGFARM.
arXiv Detail & Related papers (2023-10-03T20:01:51Z) - WELL: Applying Bug Detectors to Bug Localization via Weakly Supervised
Learning [37.09621161662761]
This paper proposes a WEakly supervised bug LocaLization (WELL) method to train a bug localization model.
With CodeBERT finetuned on the buggy-or-not binary labeled data, WELL can address bug localization in a weakly supervised manner.
arXiv Detail & Related papers (2023-05-27T06:34:26Z) - Too Few Bug Reports? Exploring Data Augmentation for Improved
Changeset-based Bug Localization [7.884766610628946]
We propose novel data augmentation operators that act on different constituent components of bug reports.
We also describe a data balancing strategy that aims to create a corpus of augmented bug reports.
arXiv Detail & Related papers (2023-05-25T19:06:01Z) - Mitigating Temporal Misalignment by Discarding Outdated Facts [58.620269228776294]
Large language models are often used under temporal misalignment, tasked with answering questions about the present.
We propose fact duration prediction: the task of predicting how long a given fact will remain true.
Our data and code are released publicly at https://github.com/mikejqzhang/mitigating_misalignment.
arXiv Detail & Related papers (2023-05-24T07:30:08Z) - Using Developer Discussions to Guide Fixing Bugs in Software [51.00904399653609]
We propose using bug report discussions, which are available before the task is performed and are also naturally occurring, avoiding the need for additional information from developers.
We demonstrate that various forms of natural language context derived from such discussions can aid bug-fixing, even leading to improved performance over using commit messages corresponding to the oracle bug-fixing commits.
arXiv Detail & Related papers (2022-11-11T16:37:33Z) - BigIssue: A Realistic Bug Localization Benchmark [89.8240118116093]
BigIssue is a benchmark for realistic bug localization.
We provide a general benchmark with a diversity of real and synthetic Java bugs.
We hope to advance the state of the art in bug localization, in turn improving APR performance and increasing its applicability to the modern development cycle.
arXiv Detail & Related papers (2022-07-21T20:17:53Z) - Memory-Based Model Editing at Scale [102.28475739907498]
Existing model editors struggle to accurately model an edit's intended scope.
We propose Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model (SERAC)
SERAC stores edits in an explicit memory and learns to reason over them to modulate the base model's predictions as needed.
arXiv Detail & Related papers (2022-06-13T23:40:34Z) - DapStep: Deep Assignee Prediction for Stack Trace Error rePresentation [61.99379022383108]
We propose new deep learning models to solve the bug triage problem.
The models are based on a bidirectional recurrent neural network with attention and on a convolutional neural network.
To improve the quality of ranking, we propose using additional information from version control system annotations.
arXiv Detail & Related papers (2022-01-14T00:16:57Z) - DeepMutants: Training neural bug detectors with contextual mutations [0.799536002595393]
Learning-based bug detectors promise to find bugs in large code bases by exploiting natural hints.
Still, existing techniques tend to underperform when presented with realistic bugs.
We propose a novel contextual mutation operator which dynamically injects natural and more realistic faults into code.
arXiv Detail & Related papers (2021-07-14T12:45:48Z) - Editing Factual Knowledge in Language Models [51.947280241185]
We present KnowledgeEditor, a method that can be used to edit this knowledge.
Besides being computationally efficient, KnowledgeEditor does not require any modifications in LM pre-training.
We show KnowledgeEditor's efficacy with two popular architectures and knowledge-intensive tasks.
arXiv Detail & Related papers (2021-04-16T15:24:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.