Related papers: RunBugRun -- An Executable Dataset for Automated Program Repair

RunBugRun -- An Executable Dataset for Automated Program Repair

URL: http://arxiv.org/abs/2304.01102v1
Date: Mon, 3 Apr 2023 16:02:00 GMT
Title: RunBugRun -- An Executable Dataset for Automated Program Repair
Authors: Julian Aron Prenner and Romain Robbes
Abstract summary: We present a fully executable dataset of 450,000 small buggy/fixed program pairs originally submitted to programming competition websites. We provide infrastructure to compile, safely execute and test programs as well as fine-grained bug-type labels.
Score: 15.670905650869704
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, we can notice a transition to data-driven techniques in Automated Program Repair (APR), in particular towards deep neural networks. This entails training on hundreds of thousands or even millions of non-executable code fragments. We would like to bring more attention to an aspect of code often neglected in Neural Program Repair (NPR), namely its execution. Code execution has several significant advantages. It allows for test-based evaluation of candidate fixes and can provide valuable information to aid repair. In this work we present a fully executable dataset of 450,000 small buggy/fixed program pairs originally submitted to programming competition websites written in eight different programming languages. Along with the dataset we provide infrastructure to compile, safely execute and test programs as well as fine-grained bug-type labels. To give a point of reference, we provide basic evaluation results for two baselines, one based on a generate-and-validate approach and one on deep learning. With this dataset we follow several goals: we want to lift Neural Program Repair beyond fully static code representations, foster the use of execution-based features and, by including several different languages, counterbalance the predominance of Java in the current landscape of APR datasets and benchmarks.

Related papers

Input-Gen: Guided Generation of Stateful Inputs for Testing, Tuning, and Training [1.660242118349614]
We show that inputs, like code, can be generated automatically at scale. Our approach is able to produce valid inputs, including initial memory states, for 90% of the ComPile dataset modules.
arXiv Detail & Related papers (2024-06-13T06:09:16Z)
Benchmarking Educational Program Repair [4.981275578987307]
Large language models (LLMs) can be used to generate learning resources, improve error messages, and provide feedback on code. There is a pressing need for standardization and benchmarks that facilitate the equitable comparison of competing approaches. In this article, we propose a novel educational program repair benchmark.
arXiv Detail & Related papers (2024-05-08T18:23:59Z)
NExT: Teaching Large Language Models to Reason about Code Execution [50.93581376646064]
Large language models (LLMs) of code are typically trained on the surface textual form of programs. We propose NExT, a method to teach LLMs to inspect the execution traces of programs and reason about their run-time behavior.
arXiv Detail & Related papers (2024-04-23T01:46:32Z)
A Novel Approach for Automatic Program Repair using Round-Trip Translation with Large Language Models [50.86686630756207]
Research shows that grammatical mistakes in a sentence can be corrected by translating it to another language and back. Current generative models for Automatic Program Repair (APR) are pre-trained on source code and fine-tuned for repair. This paper proposes bypassing the fine-tuning step and using Round-Trip Translation (RTT): translation of code from one programming language to another programming or natural language, and back.
arXiv Detail & Related papers (2024-01-15T22:36:31Z)
Flexible Control Flow Graph Alignment for Delivering Data-Driven Feedback to Novice Programming Learners [0.847136673632881]
We present several modifications to CLARA, a data-driven automated repair approach that is open source. We extend CLARA's abstract syntax tree processor to handle common introductory programming constructs. We modify an incorrect program's control flow graph to match the correct programs to apply CLARA's original repair process.
arXiv Detail & Related papers (2024-01-02T19:56:50Z)
Leveraging Generative AI: Improving Software Metadata Classification with Generated Code-Comment Pairs [0.0]
In software development, code comments play a crucial role in enhancing code comprehension and collaboration. This research paper addresses the challenge of objectively classifying code comments as "Useful" or "Not Useful" We propose a novel solution that harnesses contextualized embeddings, particularly BERT, to automate this classification process.
arXiv Detail & Related papers (2023-10-14T12:09:43Z)
Teaching Large Language Models to Self-Debug [62.424077000154945]
Large language models (LLMs) have achieved impressive performance on code generation. We propose Self- Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations.
arXiv Detail & Related papers (2023-04-11T10:43:43Z)
Detect-Localize-Repair: A Unified Framework for Learning to Debug with CodeT5 [14.712753336831172]
We propose a novel unified emphDetect-Localize-Repair framework based on a pretrained programming language model CodeT5. Our model significantly outperforms existing baselines from both NLP and software engineering domains.
arXiv Detail & Related papers (2022-11-27T16:11:29Z)
Natural Language to Code Translation with Execution [82.52142893010563]
Execution result--minimum Bayes risk decoding for program selection. We show that it improves the few-shot performance of pretrained code models on natural-language-to-code tasks.
arXiv Detail & Related papers (2022-04-25T06:06:08Z)
AVATAR: A Parallel Corpus for Java-Python Program Translation [77.86173793901139]
Program translation refers to migrating source code from one language to another. We present AVATAR, a collection of 9,515 programming problems and their solutions written in two popular languages, Java and Python.
arXiv Detail & Related papers (2021-08-26T05:44:20Z)
Graph-based, Self-Supervised Program Repair from Diagnostic Feedback [108.48853808418725]
We introduce a program-feedback graph, which connects symbols relevant to program repair in source code and diagnostic feedback. We then apply a graph neural network on top to model the reasoning process. We present a self-supervised learning paradigm for program repair that leverages unlabeled programs available online.
arXiv Detail & Related papers (2020-05-20T07:24:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.