RunBugRun -- An Executable Dataset for Automated Program Repair
- URL: http://arxiv.org/abs/2304.01102v1
- Date: Mon, 3 Apr 2023 16:02:00 GMT
- Title: RunBugRun -- An Executable Dataset for Automated Program Repair
- Authors: Julian Aron Prenner and Romain Robbes
- Abstract summary: We present a fully executable dataset of 450,000 small buggy/fixed program pairs originally submitted to programming competition websites.
We provide infrastructure to compile, safely execute and test programs as well as fine-grained bug-type labels.
- Score: 15.670905650869704
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, we can notice a transition to data-driven techniques in Automated
Program Repair (APR), in particular towards deep neural networks. This entails
training on hundreds of thousands or even millions of non-executable code
fragments. We would like to bring more attention to an aspect of code often
neglected in Neural Program Repair (NPR), namely its execution. Code execution
has several significant advantages. It allows for test-based evaluation of
candidate fixes and can provide valuable information to aid repair. In this
work we present a fully executable dataset of 450,000 small buggy/fixed program
pairs originally submitted to programming competition websites written in eight
different programming languages. Along with the dataset we provide
infrastructure to compile, safely execute and test programs as well as
fine-grained bug-type labels. To give a point of reference, we provide basic
evaluation results for two baselines, one based on a generate-and-validate
approach and one on deep learning. With this dataset we follow several goals:
we want to lift Neural Program Repair beyond fully static code representations,
foster the use of execution-based features and, by including several different
languages, counterbalance the predominance of Java in the current landscape of
APR datasets and benchmarks.
Related papers
- Input-Gen: Guided Generation of Stateful Inputs for Testing, Tuning, and Training [1.660242118349614]
We show that inputs, like code, can be generated automatically at scale.
Our approach is able to produce valid inputs, including initial memory states, for 90% of the ComPile dataset modules.
arXiv Detail & Related papers (2024-06-13T06:09:16Z) - Benchmarking Educational Program Repair [4.981275578987307]
Large language models (LLMs) can be used to generate learning resources, improve error messages, and provide feedback on code.
There is a pressing need for standardization and benchmarks that facilitate the equitable comparison of competing approaches.
In this article, we propose a novel educational program repair benchmark.
arXiv Detail & Related papers (2024-05-08T18:23:59Z) - NExT: Teaching Large Language Models to Reason about Code Execution [50.93581376646064]
Large language models (LLMs) of code are typically trained on the surface textual form of programs.
We propose NExT, a method to teach LLMs to inspect the execution traces of programs and reason about their run-time behavior.
arXiv Detail & Related papers (2024-04-23T01:46:32Z) - A Novel Approach for Automatic Program Repair using Round-Trip
Translation with Large Language Models [50.86686630756207]
Research shows that grammatical mistakes in a sentence can be corrected by translating it to another language and back.
Current generative models for Automatic Program Repair (APR) are pre-trained on source code and fine-tuned for repair.
This paper proposes bypassing the fine-tuning step and using Round-Trip Translation (RTT): translation of code from one programming language to another programming or natural language, and back.
arXiv Detail & Related papers (2024-01-15T22:36:31Z) - Flexible Control Flow Graph Alignment for Delivering Data-Driven
Feedback to Novice Programming Learners [0.847136673632881]
We present several modifications to CLARA, a data-driven automated repair approach that is open source.
We extend CLARA's abstract syntax tree processor to handle common introductory programming constructs.
We modify an incorrect program's control flow graph to match the correct programs to apply CLARA's original repair process.
arXiv Detail & Related papers (2024-01-02T19:56:50Z) - Leveraging Generative AI: Improving Software Metadata Classification
with Generated Code-Comment Pairs [0.0]
In software development, code comments play a crucial role in enhancing code comprehension and collaboration.
This research paper addresses the challenge of objectively classifying code comments as "Useful" or "Not Useful"
We propose a novel solution that harnesses contextualized embeddings, particularly BERT, to automate this classification process.
arXiv Detail & Related papers (2023-10-14T12:09:43Z) - Teaching Large Language Models to Self-Debug [62.424077000154945]
Large language models (LLMs) have achieved impressive performance on code generation.
We propose Self- Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations.
arXiv Detail & Related papers (2023-04-11T10:43:43Z) - Detect-Localize-Repair: A Unified Framework for Learning to Debug with
CodeT5 [14.712753336831172]
We propose a novel unified emphDetect-Localize-Repair framework based on a pretrained programming language model CodeT5.
Our model significantly outperforms existing baselines from both NLP and software engineering domains.
arXiv Detail & Related papers (2022-11-27T16:11:29Z) - Natural Language to Code Translation with Execution [82.52142893010563]
Execution result--minimum Bayes risk decoding for program selection.
We show that it improves the few-shot performance of pretrained code models on natural-language-to-code tasks.
arXiv Detail & Related papers (2022-04-25T06:06:08Z) - AVATAR: A Parallel Corpus for Java-Python Program Translation [77.86173793901139]
Program translation refers to migrating source code from one language to another.
We present AVATAR, a collection of 9,515 programming problems and their solutions written in two popular languages, Java and Python.
arXiv Detail & Related papers (2021-08-26T05:44:20Z) - Graph-based, Self-Supervised Program Repair from Diagnostic Feedback [108.48853808418725]
We introduce a program-feedback graph, which connects symbols relevant to program repair in source code and diagnostic feedback.
We then apply a graph neural network on top to model the reasoning process.
We present a self-supervised learning paradigm for program repair that leverages unlabeled programs available online.
arXiv Detail & Related papers (2020-05-20T07:24:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.