Related papers: Can we learn from developer mistakes? Learning to localize and repair real bugs from real bug fixes

Can we learn from developer mistakes? Learning to localize and repair real bugs from real bug fixes

URL: http://arxiv.org/abs/2207.00301v1
Date: Fri, 1 Jul 2022 09:49:17 GMT
Title: Can we learn from developer mistakes? Learning to localize and repair real bugs from real bug fixes
Authors: Cedric Richter and Heike Wehrheim
Abstract summary: We introduce RealiT, a pre-train-and-fine-tune approach for learning to localize and repair real bugs from real bug fixes. We found that training on real bug fixes with RealiT is empirically powerful by nearly doubling the localization performance of an existing model on real bugs.
Score: 0.5330240017302619
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Real bug fixes found in open source repositories seem to be the perfect source for learning to localize and repair real bugs. However, the absence of large scale bug fix collections has made it difficult to effectively exploit real bug fixes in the training of larger neural models in the past. In contrast, artificial bugs -- produced by mutating existing source code -- can be easily obtained at a sufficient scale and are therefore often preferred in the training of existing approaches. Still, localization and repair models that are trained on artificial bugs usually underperform when faced with real bugs. This raises the question whether bug localization and repair models trained on real bug fixes are more effective in localizing and repairing real bugs. We address this question by introducing RealiT, a pre-train-and-fine-tune approach for effectively learning to localize and repair real bugs from real bug fixes. RealiT is first pre-trained on a large number of artificial bugs produced by traditional mutation operators and then fine-tuned on a smaller set of real bug fixes. Fine-tuning does not require any modifications of the learning algorithm and hence can be easily adopted in various training scenarios for bug localization or repair (even when real training data is scarce). In addition, we found that training on real bug fixes with RealiT is empirically powerful by nearly doubling the localization performance of an existing model on real bugs while maintaining or even improving the repair performance.

Related papers

Leveraging Data Characteristics for Bug Localization in Deep Learning Programs [21.563130049562357]
We propose Theia, which detects and localizes structural bugs in Deep Learning (DL) programs. Our results show that Theia successfully localizes 57/75 structural bugs in 40 buggy programs, whereas NeuraLint, a state-of-the-art approach capable of localizing structural bugs before training localizes 17/75 bugs.
arXiv Detail & Related papers (2024-12-08T01:52:06Z)
Towards Understanding the Challenges of Bug Localization in Deep Learning Systems [2.9312156642007294]
We conduct a large-scale empirical study to better understand the challenges of localizing bugs in deep-learning systems. First, we determine the bug localization performance of four existing techniques using 2,365 bugs from deep-learning systems and 2,913 from traditional software. Second, we evaluate how different bug types in deep learning systems impact bug localization.
arXiv Detail & Related papers (2024-02-01T21:17:42Z)
Automated Bug Generation in the era of Large Language Models [6.0770779409377775]
BugFarm transforms arbitrary code into multiple complex bugs. A comprehensive evaluation of 435k+ bugs from over 1.9M mutants generated by BUGFARM.
arXiv Detail & Related papers (2023-10-03T20:01:51Z)
WELL: Applying Bug Detectors to Bug Localization via Weakly Supervised Learning [37.09621161662761]
This paper proposes a WEakly supervised bug LocaLization (WELL) method to train a bug localization model. With CodeBERT finetuned on the buggy-or-not binary labeled data, WELL can address bug localization in a weakly supervised manner.
arXiv Detail & Related papers (2023-05-27T06:34:26Z)
Too Few Bug Reports? Exploring Data Augmentation for Improved Changeset-based Bug Localization [7.884766610628946]
We propose novel data augmentation operators that act on different constituent components of bug reports. We also describe a data balancing strategy that aims to create a corpus of augmented bug reports.
arXiv Detail & Related papers (2023-05-25T19:06:01Z)
Mitigating Temporal Misalignment by Discarding Outdated Facts [58.620269228776294]
Large language models are often used under temporal misalignment, tasked with answering questions about the present. We propose fact duration prediction: the task of predicting how long a given fact will remain true. Our data and code are released publicly at https://github.com/mikejqzhang/mitigating_misalignment.
arXiv Detail & Related papers (2023-05-24T07:30:08Z)
Using Developer Discussions to Guide Fixing Bugs in Software [51.00904399653609]
We propose using bug report discussions, which are available before the task is performed and are also naturally occurring, avoiding the need for additional information from developers. We demonstrate that various forms of natural language context derived from such discussions can aid bug-fixing, even leading to improved performance over using commit messages corresponding to the oracle bug-fixing commits.
arXiv Detail & Related papers (2022-11-11T16:37:33Z)
BigIssue: A Realistic Bug Localization Benchmark [89.8240118116093]
BigIssue is a benchmark for realistic bug localization. We provide a general benchmark with a diversity of real and synthetic Java bugs. We hope to advance the state of the art in bug localization, in turn improving APR performance and increasing its applicability to the modern development cycle.
arXiv Detail & Related papers (2022-07-21T20:17:53Z)
Memory-Based Model Editing at Scale [102.28475739907498]
Existing model editors struggle to accurately model an edit's intended scope. We propose Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model (SERAC) SERAC stores edits in an explicit memory and learns to reason over them to modulate the base model's predictions as needed.
arXiv Detail & Related papers (2022-06-13T23:40:34Z)
DapStep: Deep Assignee Prediction for Stack Trace Error rePresentation [61.99379022383108]
We propose new deep learning models to solve the bug triage problem. The models are based on a bidirectional recurrent neural network with attention and on a convolutional neural network. To improve the quality of ranking, we propose using additional information from version control system annotations.
arXiv Detail & Related papers (2022-01-14T00:16:57Z)
DeepMutants: Training neural bug detectors with contextual mutations [0.799536002595393]
Learning-based bug detectors promise to find bugs in large code bases by exploiting natural hints. Still, existing techniques tend to underperform when presented with realistic bugs. We propose a novel contextual mutation operator which dynamically injects natural and more realistic faults into code.
arXiv Detail & Related papers (2021-07-14T12:45:48Z)
Editing Factual Knowledge in Language Models [51.947280241185]
We present KnowledgeEditor, a method that can be used to edit this knowledge. Besides being computationally efficient, KnowledgeEditor does not require any modifications in LM pre-training. We show KnowledgeEditor's efficacy with two popular architectures and knowledge-intensive tasks.
arXiv Detail & Related papers (2021-04-16T15:24:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.