HyperPUT: Generating Synthetic Faulty Programs to Challenge Bug-Finding
Tools
- URL: http://arxiv.org/abs/2209.06615v2
- Date: Fri, 2 Feb 2024 14:43:44 GMT
- Title: HyperPUT: Generating Synthetic Faulty Programs to Challenge Bug-Finding
Tools
- Authors: Riccardo Felici, Laura Pozzi and Carlo A. Furia
- Abstract summary: We propose a complementary approach that automatically generates programs with seeded bugs.
Our technique, called HyperPUT, builds C programs from a "seed" bug by incrementally applying program transformations.
- Score: 3.8520163964103835
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As research in automatically detecting bugs grows and produces new
techniques, having suitable collections of programs with known bugs becomes
crucial to reliably and meaningfully compare the effectiveness of these
techniques. Most of the existing approaches rely on benchmarks collecting
manually curated real-world bugs, or synthetic bugs seeded into real-world
programs. Using real-world programs entails that extending the existing
benchmarks or creating new ones remains a complex time-consuming task.
In this paper, we propose a complementary approach that automatically
generates programs with seeded bugs. Our technique, called HyperPUT, builds C
programs from a "seed" bug by incrementally applying program transformations
(introducing programming constructs such as conditionals, loops, etc.) until a
program of the desired size is generated. In our experimental evaluation, we
demonstrate how HyperPUT can generate buggy programs that can challenge in
different ways the capabilities of modern bug-finding tools, and some of whose
characteristics are comparable to those of bugs in existing benchmarks. These
results suggest that HyperPUT can be a useful tool to support further research
in bug-finding techniques -- in particular their empirical evaluation.
Related papers
- RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic
Program Repair [75.40584530380589]
We propose a novel Retrieval-Augmented Patch Generation framework (RAP-Gen)
RAP-Gen explicitly leveraging relevant fix patterns retrieved from a list of previous bug-fix pairs.
We evaluate RAP-Gen on three benchmarks in two programming languages, including the TFix benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java.
arXiv Detail & Related papers (2023-09-12T08:52:56Z) - Modelling Concurrency Bugs Using Machine Learning [0.0]
This project aims to compare both common and recent machine learning approaches.
We define a synthetic dataset that we generate with the scope of simulating real-life (concurrent) programs.
We formulate hypotheses about fundamental limits of various machine learning model types.
arXiv Detail & Related papers (2023-05-08T17:30:24Z) - Fully Autonomous Programming with Large Language Models [0.9558392439655015]
Current approaches to program synthesis with Large Language Models (LLMs) exhibit a "near miss syndrome"
We use OpenAI Codex as the LLM and Program Synthesis Benchmark 2 as a database of problem descriptions and tests for evaluation.
The resulting framework outperforms both conventional usage of Codex without the repair phase and traditional genetic programming approaches.
arXiv Detail & Related papers (2023-04-20T16:12:05Z) - Generation Probabilities Are Not Enough: Uncertainty Highlighting in AI Code Completions [54.55334589363247]
We study whether conveying information about uncertainty enables programmers to more quickly and accurately produce code.
We find that highlighting tokens with the highest predicted likelihood of being edited leads to faster task completion and more targeted edits.
arXiv Detail & Related papers (2023-02-14T18:43:34Z) - BigIssue: A Realistic Bug Localization Benchmark [89.8240118116093]
BigIssue is a benchmark for realistic bug localization.
We provide a general benchmark with a diversity of real and synthetic Java bugs.
We hope to advance the state of the art in bug localization, in turn improving APR performance and increasing its applicability to the modern development cycle.
arXiv Detail & Related papers (2022-07-21T20:17:53Z) - Fault-Aware Neural Code Rankers [64.41888054066861]
We propose fault-aware neural code rankers that can predict the correctness of a sampled program without executing it.
Our fault-aware rankers can significantly increase the pass@1 accuracy of various code generation models.
arXiv Detail & Related papers (2022-06-04T22:01:05Z) - Learning from Self-Sampled Correct and Partially-Correct Programs [96.66452896657991]
We propose to let the model perform sampling during training and learn from both self-sampled fully-correct programs and partially-correct programs.
We show that our use of self-sampled correct and partially-correct programs can benefit learning and help guide the sampling process.
Our proposed method improves the pass@k performance by 3.1% to 12.3% compared to learning from a single reference program with MLE.
arXiv Detail & Related papers (2022-05-28T03:31:07Z) - Iterative Genetic Improvement: Scaling Stochastic Program Synthesis [11.195558777385259]
Program synthesis aims to it automatically find programs from an underlying programming language that satisfy a given specification.
Existing program synthesis techniques do not meet this expectation very well, suffering from the scalability issue.
Here we propose a new framework for program synthesis, called iterative genetic improvement to overcome this problem.
arXiv Detail & Related papers (2022-02-26T02:00:35Z) - Searching for More Efficient Dynamic Programs [61.79535031840558]
We describe a set of program transformations, a simple metric for assessing the efficiency of a transformed program, and a search procedure to improve this metric.
We show that in practice, automated search can find substantial improvements to the initial program.
arXiv Detail & Related papers (2021-09-14T20:52:55Z) - Recent Developments in Program Synthesis with Evolutionary Algorithms [1.8047694351309207]
We identify the relevant evolutionary program synthesis approaches and provide an in-depth analysis of their performance.
The most influential approaches we identify are stack-based, grammar-guided, as well as linear genetic programming.
For future work, we encourage researchers not only to use a program's output for assessing the quality of a solution but also the way towards a solution.
arXiv Detail & Related papers (2021-08-27T11:38:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.