Bugfix: a standard language, database schema and repository for research on bugs and automatic program repair
- URL: http://arxiv.org/abs/2502.15599v2
- Date: Thu, 27 Feb 2025 10:48:22 GMT
- Title: Bugfix: a standard language, database schema and repository for research on bugs and automatic program repair
- Authors: Victoria Kananchuk, Ilgiz Mustafin, Bertrand Meyer,
- Abstract summary: Automatic Program Repair (APR) is a brilliant idea: when detecting a bug, also provide suggestions for correcting the program.<n>Bugfix is an effort at providing such a framework: a standardized set of notations, tools and interfaces, as well as a database of bugs and fixes.
- Score: 30.530037104130233
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automatic Program Repair (APR) is a brilliant idea: when detecting a bug, also provide suggestions for correcting the program. Progress towards that goal is hindered by the absence of a common frame of reference for the multiplicity of APR ideas, methods, tools, programming languages and environments. Bugfix is an effort at providing such a framework: a standardized set of notations, tools and interfaces, as well as a database of bugs and fixes, for use by the APR research community to try out ideas and compare results. The most directly visible component of the Bugfix effort is the Bugfix language, a human-readable formalism making it possible to describe elements of the following kinds: a bug (described abstractly, for example the permutation of two arguments in a call); a bug example (an actual occurrence of a bug, in a specific code written in a specific programming language, and usually recorded in some repository); a fix (a particular correction of a bug, obtained for example by reversing the misplaced arguments); an application (an entity that demonstrates how a actual code example matches with a fix); a construct (the abstract description of a programming mechanism, for example a ``while'' loop, independently of its realization in a programming language; and a language (a description of how a particular programming language includes certain constructs and provides specific concrete syntax for each of them -- for example Java includes loop, assignment etc. and has a defined format for each of them). A JSON API provides it in a form accessible to tools. Bugfix includes a repository containing a considerable amount of bugs, examples and fixes. Note: An early step towards this article was a short contribution (Ref [1]) to the 2024 ICSE. The present text reuses a few elements of introduction and motivation but is otherwise thoroughly reworked and extended.
Related papers
- Type-Constrained Code Generation with Language Models [51.03439021895432]
Large language models (LLMs) produce uncompilable output because their next-token inference procedure does not model formal aspects of code.
We introduce a type-constrained decoding approach that leverages type systems to guide code generation.
Our approach reduces compilation errors by more than half and increases functional correctness in code synthesis, translation, and repair tasks.
arXiv Detail & Related papers (2025-04-12T15:03:00Z) - What is a "bug"? On subjectivity, epistemic power, and implications for
software research [8.116831482130555]
"Bug" has been a colloquialism for an engineering "defect" at least since the 1870s.
Most modern software-oriented definitions speak to a disconnect between what a developer intended and what a program actually does.
"Finding bugs is easy" begins by saying "bug patterns are code that are often errors"
arXiv Detail & Related papers (2024-02-13T01:52:42Z) - A Novel Approach for Automatic Program Repair using Round-Trip
Translation with Large Language Models [50.86686630756207]
Research shows that grammatical mistakes in a sentence can be corrected by translating it to another language and back.
Current generative models for Automatic Program Repair (APR) are pre-trained on source code and fine-tuned for repair.
This paper proposes bypassing the fine-tuning step and using Round-Trip Translation (RTT): translation of code from one programming language to another programming or natural language, and back.
arXiv Detail & Related papers (2024-01-15T22:36:31Z) - Automated Bug Generation in the era of Large Language Models [6.0770779409377775]
BugFarm transforms arbitrary code into multiple complex bugs.
A comprehensive evaluation of 435k+ bugs from over 1.9M mutants generated by BUGFARM.
arXiv Detail & Related papers (2023-10-03T20:01:51Z) - RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic
Program Repair [75.40584530380589]
We propose a novel Retrieval-Augmented Patch Generation framework (RAP-Gen)
RAP-Gen explicitly leveraging relevant fix patterns retrieved from a list of previous bug-fix pairs.
We evaluate RAP-Gen on three benchmarks in two programming languages, including the TFix benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java.
arXiv Detail & Related papers (2023-09-12T08:52:56Z) - Explaining Software Bugs Leveraging Code Structures in Neural Machine Translation [4.519754139322584]
Bugsplainer generates natural language explanations for software bugs by learning from a large corpus of bug-fix commits.<n>Our evaluation using three performance metrics shows that Bugsplainer can generate understandable and good explanations according to Google's standard.<n>We also conduct a developer study involving 20 participants where the explanations from Bugsplainer were found to be more accurate, more precise, more concise and more useful than the baselines.
arXiv Detail & Related papers (2022-12-08T22:19:45Z) - Using Developer Discussions to Guide Fixing Bugs in Software [51.00904399653609]
We propose using bug report discussions, which are available before the task is performed and are also naturally occurring, avoiding the need for additional information from developers.
We demonstrate that various forms of natural language context derived from such discussions can aid bug-fixing, even leading to improved performance over using commit messages corresponding to the oracle bug-fixing commits.
arXiv Detail & Related papers (2022-11-11T16:37:33Z) - BigIssue: A Realistic Bug Localization Benchmark [89.8240118116093]
BigIssue is a benchmark for realistic bug localization.
We provide a general benchmark with a diversity of real and synthetic Java bugs.
We hope to advance the state of the art in bug localization, in turn improving APR performance and increasing its applicability to the modern development cycle.
arXiv Detail & Related papers (2022-07-21T20:17:53Z) - Generating Bug-Fixes Using Pretrained Transformers [11.012132897417592]
We introduce a data-driven program repair approach which learns to detect and fix bugs in Java methods mined from real-world GitHub.
We show that pretraining on source code programs improves the number of patches found by 33% as compared to supervised training from scratch.
We refine the standard accuracy evaluation metric into non-deletion and deletion-only fixes, and show that our best model generates 75% more non-deletion fixes than the previous state of the art.
arXiv Detail & Related papers (2021-04-16T05:27:04Z) - Advaita: Bug Duplicity Detection System [1.9624064951902522]
Duplicate bugs rate (% of duplicate bugs) are in the range from single digit (1 to 9%) to double digits (40%) based on the product maturity, size of the code and number of engineers working on the project.
Detecting duplicity deals with identifying whether any two bugs convey the same meaning.
This approach considers multiple sets of features viz. basic text statistical features, semantic features and contextual features.
arXiv Detail & Related papers (2020-01-24T04:48:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.