Related papers: GitBug-Java: A Reproducible Benchmark of Recent Java Bugs

Related papers

CrashFixer: A crash resolution agent for the Linux kernel [58.152358195983155]
This work builds upon kGym, which shares a benchmark for system-level Linux kernel bugs and a platform to run experiments on the Linux kernel. This paper introduces CrashFixer, the first LLM-based software repair agent that is applicable to Linux kernel bugs.
arXiv Detail & Related papers (2025-04-29T04:18:51Z)
GitBugs: Bug Reports for Duplicate Detection, Retrieval Augmented Generation, Triage, and More [0.0]
We present GitBugs, a comprehen-sive and up-to-date dataset of over 150,000 bug reports from nine actively maintained open-source projects. GitBugs aggregates data from Github, Bugzilla and Jira issue trackers, offering standardized categorical fields for classification tasks. It includes ex- ploratory analysis notebooks and detailed project-level statistics, such as duplicate rates and resolution times.
arXiv Detail & Related papers (2025-04-13T16:55:28Z)
EnvBench: A Benchmark for Automated Environment Setup [76.02998475135824]
Large Language Models have enabled researchers to focus on practical repository-level tasks in software engineering domain. Existing studies on environment setup introduce innovative agentic strategies, but their evaluation is often based on small datasets. To address this gap, we introduce a comprehensive environment setup benchmark EnvBench.
arXiv Detail & Related papers (2025-03-18T17:19:12Z)
JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models [123.66104233291065]
Jailbreak attacks cause large language models (LLMs) to generate harmful, unethical, or otherwise objectionable content. evaluating these attacks presents a number of challenges, which the current collection of benchmarks and evaluation techniques do not adequately address. JailbreakBench is an open-sourced benchmark with the following components.
arXiv Detail & Related papers (2024-03-28T02:44:02Z)
MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution [47.850418420195304]
Large Language Models (LLMs) have shown promise in code generation but face difficulties in resolving GitHub issues. We propose a novel Multi-Agent framework for GitHub Issue reSolution, MAGIS, consisting of four agents customized for software evolution.
arXiv Detail & Related papers (2024-03-26T17:57:57Z)
BugsInPy: A Database of Existing Bugs in Python Programs to Enable Controlled Testing and Debugging Studies [8.746971239693066]
For the first time, Python outperformed Java in Stack Overflow developer survey. This is in stark contrast with the abundance of testing and debug tools for Java. In this project, we create a benchmark database and tool that contain 493 real bugs from 17 real-world Python programs.
arXiv Detail & Related papers (2024-01-27T19:07:34Z)
DebugBench: Evaluating Debugging Capability of Large Language Models [80.73121177868357]
DebugBench is a benchmark for Large Language Models (LLMs) It covers four major bug categories and 18 minor types in C++, Java, and Python. We evaluate two commercial and four open-source models in a zero-shot scenario.
arXiv Detail & Related papers (2024-01-09T15:46:38Z)
GitBug-Actions: Building Reproducible Bug-Fix Benchmarks with GitHub Actions [8.508198765617196]
We present GitBug-Actions, a novel tool for building bug-fix benchmarks with modern and fully-reproducible bug-fixes. GitBug-Actions relies on the most popular CI platform, GitHub Actions, to detect bug-fixes. To demonstrate our toolchain, we deploy GitBug-Actions to build a proof-of-concept Go bug-fix benchmark.
arXiv Detail & Related papers (2023-10-24T09:04:14Z)
PreciseBugCollector: Extensible, Executable and Precise Bug-fix Collection [8.79879909193717]
We introduce PreciseBugCollector, a precise, multi-language bug collection approach. It is based on two novel components: a bug tracker to map the repositories with external bug repositories to trace bug type information, and a bug injector to generate project-specific bugs. To date, PreciseBugCollector comprises 1057818 bugs extracted from 2968 open-source projects.
arXiv Detail & Related papers (2023-09-12T13:47:44Z)
RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic Program Repair [75.40584530380589]
We propose a novel Retrieval-Augmented Patch Generation framework (RAP-Gen) RAP-Gen explicitly leveraging relevant fix patterns retrieved from a list of previous bug-fix pairs. We evaluate RAP-Gen on three benchmarks in two programming languages, including the TFix benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java.
arXiv Detail & Related papers (2023-09-12T08:52:56Z)
Using Developer Discussions to Guide Fixing Bugs in Software [51.00904399653609]
We propose using bug report discussions, which are available before the task is performed and are also naturally occurring, avoiding the need for additional information from developers. We demonstrate that various forms of natural language context derived from such discussions can aid bug-fixing, even leading to improved performance over using commit messages corresponding to the oracle bug-fixing commits.
arXiv Detail & Related papers (2022-11-11T16:37:33Z)
BigIssue: A Realistic Bug Localization Benchmark [89.8240118116093]
BigIssue is a benchmark for realistic bug localization. We provide a general benchmark with a diversity of real and synthetic Java bugs. We hope to advance the state of the art in bug localization, in turn improving APR performance and increasing its applicability to the modern development cycle.
arXiv Detail & Related papers (2022-07-21T20:17:53Z)
DeepDebug: Fixing Python Bugs Using Stack Traces, Backtranslation, and Code Skeletons [5.564793925574796]
We present an approach to automated debug using large, pretrained transformers. We start by training a bug-creation model on reversed commit data for the purpose of generating synthetic bugs. Next, we focus on 10K repositories for which we can execute tests, and create buggy versions of all functions that are covered by passing tests.
arXiv Detail & Related papers (2021-05-19T18:40:16Z)
Generating Bug-Fixes Using Pretrained Transformers [11.012132897417592]
We introduce a data-driven program repair approach which learns to detect and fix bugs in Java methods mined from real-world GitHub. We show that pretraining on source code programs improves the number of patches found by 33% as compared to supervised training from scratch. We refine the standard accuracy evaluation metric into non-deletion and deletion-only fixes, and show that our best model generates 75% more non-deletion fixes than the previous state of the art.
arXiv Detail & Related papers (2021-04-16T05:27:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.