GitBug-Java: A Reproducible Benchmark of Recent Java Bugs
- URL: http://arxiv.org/abs/2402.02961v2
- Date: Tue, 6 Feb 2024 14:08:38 GMT
- Title: GitBug-Java: A Reproducible Benchmark of Recent Java Bugs
- Authors: Andr\'e Silva, Nuno Saavedra, Martin Monperrus
- Abstract summary: We present GitBug-Java, a reproducible benchmark of recent Java bugs.
GitBug-Java features 199 bugs extracted from the 2023 commit history of 55 notable open-source repositories.
- Score: 8.508198765617196
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Bug-fix benchmarks are essential for evaluating methodologies in automatic
program repair (APR) and fault localization (FL). However, existing benchmarks,
exemplified by Defects4J, need to evolve to incorporate recent bug-fixes
aligned with contemporary development practices. Moreover, reproducibility, a
key scientific principle, has been lacking in bug-fix benchmarks. To address
these gaps, we present GitBug-Java, a reproducible benchmark of recent Java
bugs. GitBug-Java features 199 bugs extracted from the 2023 commit history of
55 notable open-source repositories. The methodology for building GitBug-Java
ensures the preservation of bug-fixes in fully-reproducible environments. We
publish GitBug-Java at https://github.com/gitbugactions/gitbug-java.
Related papers
- JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models [123.66104233291065]
Jailbreak attacks cause large language models (LLMs) to generate harmful, unethical, or otherwise objectionable content.
evaluating these attacks presents a number of challenges, which the current collection of benchmarks and evaluation techniques do not adequately address.
JailbreakBench is an open-sourced benchmark with the following components.
arXiv Detail & Related papers (2024-03-28T02:44:02Z) - MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution [47.850418420195304]
Large Language Models (LLMs) have shown promise in code generation but face difficulties in resolving GitHub issues.
We propose a novel Multi-Agent framework for GitHub Issue reSolution, MAGIS, consisting of four agents customized for software evolution.
arXiv Detail & Related papers (2024-03-26T17:57:57Z) - BugsInPy: A Database of Existing Bugs in Python Programs to Enable
Controlled Testing and Debugging Studies [8.746971239693066]
For the first time, Python outperformed Java in Stack Overflow developer survey.
This is in stark contrast with the abundance of testing and debug tools for Java.
In this project, we create a benchmark database and tool that contain 493 real bugs from 17 real-world Python programs.
arXiv Detail & Related papers (2024-01-27T19:07:34Z) - DebugBench: Evaluating Debugging Capability of Large Language Models [80.73121177868357]
DebugBench is a benchmark for Large Language Models (LLMs)
It covers four major bug categories and 18 minor types in C++, Java, and Python.
We evaluate two commercial and four open-source models in a zero-shot scenario.
arXiv Detail & Related papers (2024-01-09T15:46:38Z) - GitBug-Actions: Building Reproducible Bug-Fix Benchmarks with GitHub
Actions [8.508198765617196]
We present GitBug-Actions, a novel tool for building bug-fix benchmarks with modern and fully-reproducible bug-fixes.
GitBug-Actions relies on the most popular CI platform, GitHub Actions, to detect bug-fixes.
To demonstrate our toolchain, we deploy GitBug-Actions to build a proof-of-concept Go bug-fix benchmark.
arXiv Detail & Related papers (2023-10-24T09:04:14Z) - PreciseBugCollector: Extensible, Executable and Precise Bug-fix
Collection [8.79879909193717]
We introduce PreciseBugCollector, a precise, multi-language bug collection approach.
It is based on two novel components: a bug tracker to map the repositories with external bug repositories to trace bug type information, and a bug injector to generate project-specific bugs.
To date, PreciseBugCollector comprises 1057818 bugs extracted from 2968 open-source projects.
arXiv Detail & Related papers (2023-09-12T13:47:44Z) - RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic
Program Repair [75.40584530380589]
We propose a novel Retrieval-Augmented Patch Generation framework (RAP-Gen)
RAP-Gen explicitly leveraging relevant fix patterns retrieved from a list of previous bug-fix pairs.
We evaluate RAP-Gen on three benchmarks in two programming languages, including the TFix benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java.
arXiv Detail & Related papers (2023-09-12T08:52:56Z) - Using Developer Discussions to Guide Fixing Bugs in Software [51.00904399653609]
We propose using bug report discussions, which are available before the task is performed and are also naturally occurring, avoiding the need for additional information from developers.
We demonstrate that various forms of natural language context derived from such discussions can aid bug-fixing, even leading to improved performance over using commit messages corresponding to the oracle bug-fixing commits.
arXiv Detail & Related papers (2022-11-11T16:37:33Z) - BigIssue: A Realistic Bug Localization Benchmark [89.8240118116093]
BigIssue is a benchmark for realistic bug localization.
We provide a general benchmark with a diversity of real and synthetic Java bugs.
We hope to advance the state of the art in bug localization, in turn improving APR performance and increasing its applicability to the modern development cycle.
arXiv Detail & Related papers (2022-07-21T20:17:53Z) - DeepDebug: Fixing Python Bugs Using Stack Traces, Backtranslation, and
Code Skeletons [5.564793925574796]
We present an approach to automated debug using large, pretrained transformers.
We start by training a bug-creation model on reversed commit data for the purpose of generating synthetic bugs.
Next, we focus on 10K repositories for which we can execute tests, and create buggy versions of all functions that are covered by passing tests.
arXiv Detail & Related papers (2021-05-19T18:40:16Z) - Generating Bug-Fixes Using Pretrained Transformers [11.012132897417592]
We introduce a data-driven program repair approach which learns to detect and fix bugs in Java methods mined from real-world GitHub.
We show that pretraining on source code programs improves the number of patches found by 33% as compared to supervised training from scratch.
We refine the standard accuracy evaluation metric into non-deletion and deletion-only fixes, and show that our best model generates 75% more non-deletion fixes than the previous state of the art.
arXiv Detail & Related papers (2021-04-16T05:27:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.