Related papers: Towards Enhancing the Reproducibility of Deep Learning Bugs: An Empirical Study

Towards Enhancing the Reproducibility of Deep Learning Bugs: An Empirical Study

URL: http://arxiv.org/abs/2401.03069v4
Date: Tue, 22 Oct 2024 14:50:29 GMT
Title: Towards Enhancing the Reproducibility of Deep Learning Bugs: An Empirical Study
Authors: Mehil B. Shah, Mohammad Masudur Rahman, Foutse Khomh,
Abstract summary: This paper examines the critical issue of reproducing deep learning bugs. We identify edit actions and useful information that could improve the critical issue. We successfully reproduced 148 out of 165 bugs attempted.
Score: 13.17302533571231
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Context: Deep learning has achieved remarkable progress in various domains. However, like any software system, deep learning systems contain bugs, some of which can have severe impacts, as evidenced by crashes involving autonomous vehicles. Despite substantial advancements in deep learning techniques, little research has focused on reproducing deep learning bugs, which is an essential step for their resolution. Existing literature suggests that only 3% of deep learning bugs are reproducible, underscoring the need for further research. Objective: This paper examines the reproducibility of deep learning bugs. We identify edit actions and useful information that could improve the reproducibility of deep learning bugs. Method: First, we construct a dataset of 668 deep-learning bugs from Stack Overflow and GitHub across three frameworks and 22 architectures. Second, out of the 668 bugs, we select 165 bugs using stratified sampling and attempt to determine their reproducibility. While reproducing these bugs, we identify edit actions and useful information for their reproduction. Third, we used the Apriori algorithm to identify useful information and edit actions required to reproduce specific types of bugs. Finally, we conducted a user study involving 22 developers to assess the effectiveness of our findings in real-life settings. Results: We successfully reproduced 148 out of 165 bugs attempted. We identified ten edit actions and five useful types of component information that can help us reproduce the deep learning bugs. With the help of our findings, the developers were able to reproduce 22.92% more bugs and reduce their reproduction time by 24.35%. Conclusions: Our research addresses the critical issue of deep learning bug reproducibility. Practitioners and researchers can leverage our findings to improve deep learning bug reproducibility.

Related papers

BugRepro: Enhancing Android Bug Reproduction with Domain-Specific Knowledge Integration [4.833035081314386]
BugRepro is a novel technique that integrates domain-specific knowledge to enhance the accuracy and efficiency of bug reproduction.<n>BugRepro significantly outperforms two state-of-the-art methods.
arXiv Detail & Related papers (2025-05-20T15:48:34Z)
Leveraging Data Characteristics for Bug Localization in Deep Learning Programs [21.563130049562357]
We propose Theia, which detects and localizes structural bugs in Deep Learning (DL) programs. Our results show that Theia successfully localizes 57/75 structural bugs in 40 buggy programs, whereas NeuraLint, a state-of-the-art approach capable of localizing structural bugs before training localizes 17/75 bugs.
arXiv Detail & Related papers (2024-12-08T01:52:06Z)
Understanding Code Understandability Improvements in Code Reviews [79.16476505761582]
We analyzed 2,401 code review comments from Java open-source projects on GitHub. 83.9% of suggestions for improvement were accepted and integrated, with fewer than 1% later reverted.
arXiv Detail & Related papers (2024-10-29T12:21:23Z)
Towards Understanding the Challenges of Bug Localization in Deep Learning Systems [2.9312156642007294]
We conduct a large-scale empirical study to better understand the challenges of localizing bugs in deep-learning systems. First, we determine the bug localization performance of four existing techniques using 2,365 bugs from deep-learning systems and 2,913 from traditional software. Second, we evaluate how different bug types in deep learning systems impact bug localization.
arXiv Detail & Related papers (2024-02-01T21:17:42Z)
An Empirical Study on Bugs Inside PyTorch: A Replication Study [10.848682558737494]
We characterize bugs in the PyTorch library, a very popular deep learning framework. Our results highlight that PyTorch bugs are more like traditional software projects bugs, than related to deep learning characteristics.
arXiv Detail & Related papers (2023-07-25T19:23:55Z)
Prompting Is All You Need: Automated Android Bug Replay with Large Language Models [28.69675481931385]
We propose AdbGPT, a new lightweight approach to automatically reproduce the bugs from bug reports through prompt engineering. AdbGPT leverages few-shot learning and chain-of-thought reasoning to elicit human knowledge and logical reasoning from LLMs. Our evaluations demonstrate the effectiveness and efficiency of our AdbGPT to reproduce 81.3% of bug reports in 253.6 seconds.
arXiv Detail & Related papers (2023-06-03T03:03:52Z)
Using Developer Discussions to Guide Fixing Bugs in Software [51.00904399653609]
We propose using bug report discussions, which are available before the task is performed and are also naturally occurring, avoiding the need for additional information from developers. We demonstrate that various forms of natural language context derived from such discussions can aid bug-fixing, even leading to improved performance over using commit messages corresponding to the oracle bug-fixing commits.
arXiv Detail & Related papers (2022-11-11T16:37:33Z)
BigIssue: A Realistic Bug Localization Benchmark [89.8240118116093]
BigIssue is a benchmark for realistic bug localization. We provide a general benchmark with a diversity of real and synthetic Java bugs. We hope to advance the state of the art in bug localization, in turn improving APR performance and increasing its applicability to the modern development cycle.
arXiv Detail & Related papers (2022-07-21T20:17:53Z)
Annotation Error Detection: Analyzing the Past and Present for a More Coherent Future [63.99570204416711]
We reimplement 18 methods for detecting potential annotation errors and evaluate them on 9 English datasets. We define a uniform evaluation setup including a new formalization of the annotation error detection task. We release our datasets and implementations in an easy-to-use and open source software package.
arXiv Detail & Related papers (2022-06-05T22:31:45Z)
DapStep: Deep Assignee Prediction for Stack Trace Error rePresentation [61.99379022383108]
We propose new deep learning models to solve the bug triage problem. The models are based on a bidirectional recurrent neural network with attention and on a convolutional neural network. To improve the quality of ranking, we propose using additional information from version control system annotations.
arXiv Detail & Related papers (2022-01-14T00:16:57Z)
Self-Supervised Bug Detection and Repair [27.46717890823656]
We present BugLab, an approach for self-supervised learning of bug detection and repair. A Python implementation of BugLab improves by up to 30% upon baseline methods on a test dataset of 2374 real-life bugs.
arXiv Detail & Related papers (2021-05-26T18:41:05Z)
Bayesian active learning for production, a systematic study and a reusable library [85.32971950095742]
In this paper, we analyse the main drawbacks of current active learning techniques. We do a systematic study on the effects of the most common issues of real-world datasets on the deep active learning process. We derive two techniques that can speed up the active learning loop such as partial uncertainty sampling and larger query size.
arXiv Detail & Related papers (2020-06-17T14:51:11Z)
Advaita: Bug Duplicity Detection System [1.9624064951902522]
Duplicate bugs rate (% of duplicate bugs) are in the range from single digit (1 to 9%) to double digits (40%) based on the product maturity, size of the code and number of engineers working on the project. Detecting duplicity deals with identifying whether any two bugs convey the same meaning. This approach considers multiple sets of features viz. basic text statistical features, semantic features and contextual features.
arXiv Detail & Related papers (2020-01-24T04:48:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.