Understanding Bugs in Multi-Language Deep Learning Frameworks
- URL: http://arxiv.org/abs/2303.02695v1
- Date: Sun, 5 Mar 2023 15:19:37 GMT
- Title: Understanding Bugs in Multi-Language Deep Learning Frameworks
- Authors: Zengyang Li, Sicheng Wang, Wenshuo Wang, Peng Liang, Ran Mo, Bing Li
- Abstract summary: Deep learning frameworks (DLFs) are suffering from bugs caused by the use of multiple programming languages (PLs)
We analyzed 1497 bugs in three MPL DLFs, namely MXNet, PyTorch and MXNet.
PL combination Python and C/C++ is most used in fixing more than 92% MPL bugs in all DLFs.
- Score: 12.524231041454044
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning frameworks (DLFs) have been playing an increasingly important
role in this intelligence age since they act as a basic infrastructure for an
increasingly wide range of AIbased applications. Meanwhile, as
multi-programming-language (MPL) software systems, DLFs are inevitably
suffering from bugs caused by the use of multiple programming languages (PLs).
Hence, it is of paramount significance to understand the bugs (especially the
bugs involving multiple PLs, i.e., MPL bugs) of DLFs, which can provide a
foundation for preventing, detecting, and resolving bugs in the development of
DLFs. To this end, we manually analyzed 1497 bugs in three MPL DLFs, namely
MXNet, PyTorch, and TensorFlow. First, we classified bugs in these DLFs into 12
types (e.g., algorithm design bugs and memory bugs) according to their bug
labels and characteristics. Second, we further explored the impacts of
different bug types on the development of DLFs, and found that deployment bugs
and memory bugs negatively impact the development of DLFs in different aspects
the most. Third, we found that 28.6%, 31.4%, and 16.0% of bugs in MXNet,
PyTorch, and TensorFlow are MPL bugs, respectively; the PL combination of
Python and C/C++ is most used in fixing more than 92% MPL bugs in all DLFs.
Finally, the code change complexity of MPL bug fixes is significantly greater
than that of single-programming-language (SPL) bug fixes in all the three DLFs,
while in PyTorch MPL bug fixes have longer open time and greater communication
complexity than SPL bug fixes. These results provide insights for bug
management in DLFs.
Related papers
- What's Wrong with Your Code Generated by Large Language Models? An Extensive Study [80.18342600996601]
Large language models (LLMs) produce code that is shorter yet more complicated as compared to canonical solutions.
We develop a taxonomy of bugs for incorrect codes that includes three categories and 12 sub-categories, and analyze the root cause for common bug types.
We propose a novel training-free iterative method that introduces self-critique, enabling LLMs to critique and correct their generated code based on bug types and compiler feedback.
arXiv Detail & Related papers (2024-07-08T17:27:17Z) - The Fact Selection Problem in LLM-Based Program Repair [3.7005619077967133]
We show that each fact, ranging from simple syntactic details like code context to semantic information previously unexplored in the context of Python projects, is beneficial.
Importantly, we discovered that the effectiveness of program repair prompts is non-monotonic over the number of used facts.
We develop a basic statistical model, named Maniple, which selects facts specific to a given bug to include in the prompt.
arXiv Detail & Related papers (2024-04-08T13:41:32Z) - A Novel Approach for Automatic Program Repair using Round-Trip
Translation with Large Language Models [50.86686630756207]
Research shows that grammatical mistakes in a sentence can be corrected by translating it to another language and back.
Current generative models for Automatic Program Repair (APR) are pre-trained on source code and fine-tuned for repair.
This paper proposes bypassing the fine-tuning step and using Round-Trip Translation (RTT): translation of code from one programming language to another programming or natural language, and back.
arXiv Detail & Related papers (2024-01-15T22:36:31Z) - DebugBench: Evaluating Debugging Capability of Large Language Models [80.73121177868357]
DebugBench is a benchmark for Large Language Models (LLMs)
It covers four major bug categories and 18 minor types in C++, Java, and Python.
We evaluate two commercial and four open-source models in a zero-shot scenario.
arXiv Detail & Related papers (2024-01-09T15:46:38Z) - The Earth is Flat? Unveiling Factual Errors in Large Language Models [89.94270049334479]
Large Language Models (LLMs) like ChatGPT are in various applications due to their extensive knowledge from pre-training and fine-tuning.
Despite this, they are prone to generating factual and commonsense errors, raising concerns in critical areas like healthcare, journalism, and education.
We introduce a novel, automatic testing framework, FactChecker, aimed at uncovering factual inaccuracies in LLMs.
arXiv Detail & Related papers (2024-01-01T14:02:27Z) - GlotLID: Language Identification for Low-Resource Languages [51.38634652914054]
GlotLID-M is an LID model that satisfies the desiderata of wide coverage, reliability and efficiency.
It identifies 1665 languages, a large increase in coverage compared to prior work.
arXiv Detail & Related papers (2023-10-24T23:45:57Z) - A Comprehensive Empirical Study of Bugs in Open-Source Federated
Learning Frameworks [11.835104059182832]
Federated learning (FL) is a distributed machine learning (ML) paradigm, allowing multiple clients to collaboratively train (ML) models without exposing clients' data privacy.
To foster the application of FL, a variety of FL frameworks have been proposed, allowing non-experts to easily train ML models.
We conduct the first empirical study to comprehensively collect, taxonomize, and characterize bugs in FL frameworks.
arXiv Detail & Related papers (2023-08-09T15:14:16Z) - Explaining Software Bugs Leveraging Code Structures in Neural Machine
Translation [5.079750706023254]
Bugsplainer generates natural language explanations for software bugs by learning from a large corpus of bug-fix commits.
Our evaluation using three performance metrics shows that Bugsplainer can generate understandable and good explanations according to Google's standard.
We also conduct a developer study involving 20 participants where the explanations from Bugsplainer were found to be more accurate, more precise, more concise and more useful than the baselines.
arXiv Detail & Related papers (2022-12-08T22:19:45Z) - Using Developer Discussions to Guide Fixing Bugs in Software [51.00904399653609]
We propose using bug report discussions, which are available before the task is performed and are also naturally occurring, avoiding the need for additional information from developers.
We demonstrate that various forms of natural language context derived from such discussions can aid bug-fixing, even leading to improved performance over using commit messages corresponding to the oracle bug-fixing commits.
arXiv Detail & Related papers (2022-11-11T16:37:33Z) - ADPTriage: Approximate Dynamic Programming for Bug Triage [0.0]
We develop a Markov decision process (MDP) model for an online bug triage task.
We provide an ADP-based bug triage solution, called ADPTriage, which reflects downstream uncertainty in the bug arrivals and developers' timetables.
Our result shows a significant improvement over the myopic approach in terms of assignment accuracy and fixing time.
arXiv Detail & Related papers (2022-11-02T04:42:21Z) - DABT: A Dependency-aware Bug Triaging Method [0.0]
We introduce a bug triaging method, called Dependency-aware Bug Triaging (DABT), which leverages natural language processing and integer to assign bugs to appropriate developers.
Our result shows that DABT is able to reduce the number overdue bugs up to 12%.
It also decreases the average fixing time of the bugs by half.
arXiv Detail & Related papers (2021-04-26T17:35:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.