Related papers: DeepMutants: Training neural bug detectors with contextual mutations

DeepMutants: Training neural bug detectors with contextual mutations

URL: http://arxiv.org/abs/2107.06657v1
Date: Wed, 14 Jul 2021 12:45:48 GMT
Title: DeepMutants: Training neural bug detectors with contextual mutations
Authors: Cedric Richter, Heike Wehrheim
Abstract summary: Learning-based bug detectors promise to find bugs in large code bases by exploiting natural hints. Still, existing techniques tend to underperform when presented with realistic bugs. We propose a novel contextual mutation operator which dynamically injects natural and more realistic faults into code.
Score: 0.799536002595393
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Learning-based bug detectors promise to find bugs in large code bases by exploiting natural hints such as names of variables and functions or comments. Still, existing techniques tend to underperform when presented with realistic bugs. We believe bug detector learning to currently suffer from a lack of realistic defective training examples. In fact, real world bugs are scarce which has driven existing methods to train on artificially created and mostly unrealistic mutants. In this work, we propose a novel contextual mutation operator which incorporates knowledge about the mutation context to dynamically inject natural and more realistic faults into code. Our approach employs a masked language model to produce a context-dependent distribution over feasible token replacements. The evaluation shows that sampling from a language model does not only produce mutants which more accurately represent real bugs but also lead to better performing bug detectors, both on artificial benchmarks and on real world source code.

Related papers

Counterfactual reasoning: an analysis of in-context emergence [49.58529868457226]
Large-scale neural language models (LMs) exhibit remarkable performance in in-context learning.<n>This work studies in-context counterfactual reasoning in language models, that is, to predict the consequences of changes under hypothetical scenarios.
arXiv Detail & Related papers (2025-06-05T16:02:07Z)
LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback [71.95402654982095]
We propose Math-Minos, a natural language feedback-enhanced verifier. Our experiments reveal that a small set of natural language feedback can significantly boost the performance of the verifier.
arXiv Detail & Related papers (2024-06-20T06:42:27Z)
LLMorpheus: Mutation Testing using Large Language Models [7.312170216336085]
This paper presents a technique where a Large Language Model (LLM) is prompted to suggest mutations by asking it what placeholders that have been inserted in source code could be replaced with. We find LLMorpheus to be capable of producing mutants that resemble existing bugs that cannot be produced by StrykerJS, a state-of-the-art mutation testing tool.
arXiv Detail & Related papers (2024-04-15T17:25:14Z)
Automated Bug Generation in the era of Large Language Models [6.0770779409377775]
BugFarm transforms arbitrary code into multiple complex bugs. A comprehensive evaluation of 435k+ bugs from over 1.9M mutants generated by BUGFARM.
arXiv Detail & Related papers (2023-10-03T20:01:51Z)
Teaching Large Language Models to Self-Debug [62.424077000154945]
Large language models (LLMs) have achieved impressive performance on code generation. We propose Self- Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations.
arXiv Detail & Related papers (2023-04-11T10:43:43Z)
Using Developer Discussions to Guide Fixing Bugs in Software [51.00904399653609]
We propose using bug report discussions, which are available before the task is performed and are also naturally occurring, avoiding the need for additional information from developers. We demonstrate that various forms of natural language context derived from such discussions can aid bug-fixing, even leading to improved performance over using commit messages corresponding to the oracle bug-fixing commits.
arXiv Detail & Related papers (2022-11-11T16:37:33Z)
BigIssue: A Realistic Bug Localization Benchmark [89.8240118116093]
BigIssue is a benchmark for realistic bug localization. We provide a general benchmark with a diversity of real and synthetic Java bugs. We hope to advance the state of the art in bug localization, in turn improving APR performance and increasing its applicability to the modern development cycle.
arXiv Detail & Related papers (2022-07-21T20:17:53Z)
Can we learn from developer mistakes? Learning to localize and repair real bugs from real bug fixes [0.5330240017302619]
We introduce RealiT, a pre-train-and-fine-tune approach for learning to localize and repair real bugs from real bug fixes. We found that training on real bug fixes with RealiT is empirically powerful by nearly doubling the localization performance of an existing model on real bugs.
arXiv Detail & Related papers (2022-07-01T09:49:17Z)
On Distribution Shift in Learning-based Bug Detectors [4.511923587827301]
We train a bug detector in two phases, first on a synthetic bug distribution to adapt the model to the bug detection domain, and then on a real bug distribution to drive the model towards the real distribution. We evaluate our approach extensively on three widely studied bug types, for which we construct new datasets carefully designed to capture the real bug distribution.
arXiv Detail & Related papers (2022-04-21T12:17:22Z)
Generating Bug-Fixes Using Pretrained Transformers [11.012132897417592]
We introduce a data-driven program repair approach which learns to detect and fix bugs in Java methods mined from real-world GitHub. We show that pretraining on source code programs improves the number of patches found by 33% as compared to supervised training from scratch. We refine the standard accuracy evaluation metric into non-deletion and deletion-only fixes, and show that our best model generates 75% more non-deletion fixes than the previous state of the art.
arXiv Detail & Related papers (2021-04-16T05:27:04Z)
Unnatural Language Inference [48.45003475966808]
We find that state-of-the-art NLI models, such as RoBERTa and BART, are invariant to, and sometimes even perform better on, examples with randomly reordered words. Our findings call into question the idea that our natural language understanding models, and the tasks used for measuring their progress, genuinely require a human-like understanding of syntax.
arXiv Detail & Related papers (2020-12-30T20:40:48Z)
On the Robustness of Language Encoders against Grammatical Errors [66.05648604987479]
We collect real grammatical errors from non-native speakers and conduct adversarial attacks to simulate these errors on clean text data. Results confirm that the performance of all tested models is affected but the degree of impact varies.
arXiv Detail & Related papers (2020-05-12T11:01:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.