Demystifying Dependency Bugs in Deep Learning Stack
- URL: http://arxiv.org/abs/2207.10347v2
- Date: Fri, 1 Sep 2023 16:54:38 GMT
- Title: Demystifying Dependency Bugs in Deep Learning Stack
- Authors: Kaifeng Huang, Bihuan Chen, Susheng Wu, Junmin Cao, Lei Ma, Xin Peng
- Abstract summary: This paper characterizes symptoms, root causes and fix patterns of dependency bugs (DBs) across the whole Deep Learning stack.
Our findings shed light on practical implications on dependency management.
- Score: 7.488059560714949
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning (DL) applications, built upon a heterogeneous and complex DL
stack (e.g., Nvidia GPU, Linux, CUDA driver, Python runtime, and TensorFlow),
are subject to software and hardware dependencies across the DL stack. One
challenge in dependency management across the entire engineering lifecycle is
posed by the asynchronous and radical evolution and the complex version
constraints among dependencies. Developers may introduce dependency bugs (DBs)
in selecting, using and maintaining dependencies. However, the characteristics
of DBs in DL stack is still under-investigated, hindering practical solutions
to dependency management in DL stack. To bridge this gap, this paper presents
the first comprehensive study to characterize symptoms, root causes and fix
patterns of DBs across the whole DL stack with 446 DBs collected from
StackOverflow posts and GitHub issues. For each DB, we first investigate the
symptom as well as the lifecycle stage and dependency where the symptom is
exposed. Then, we analyze the root cause as well as the lifecycle stage and
dependency where the root cause is introduced. Finally, we explore the fix
pattern and the knowledge sources that are used to fix it. Our findings from
this study shed light on practical implications on dependency management.
Related papers
- An Overview and Catalogue of Dependency Challenges in Open Source Software Package Registries [52.23798016734889]
This article provides a catalogue of dependency-related challenges that come with relying on OSS packages or libraries.
The catalogue is based on the scientific literature on empirical research that has been conducted to understand, quantify and overcome these challenges.
arXiv Detail & Related papers (2024-09-27T16:20:20Z) - A Preliminary Study on Self-Contained Libraries in the NPM Ecosystem [2.221643499902673]
The widespread of libraries within modern software ecosystems creates complex networks of dependencies.
One mitigation strategy involves reducing dependencies; libraries with zero dependencies become to self-contained.
This paper explores the characteristics of self-contained libraries within the NPM ecosystem.
arXiv Detail & Related papers (2024-06-17T09:33:49Z) - How to Understand Whole Software Repository? [64.19431011897515]
An excellent understanding of the whole repository will be the critical path to Automatic Software Engineering (ASE)
We develop a novel method named RepoUnderstander by guiding agents to comprehensively understand the whole repositories.
To better utilize the repository-level knowledge, we guide the agents to summarize, analyze, and plan.
arXiv Detail & Related papers (2024-06-03T15:20:06Z) - An empirical study of bloated dependencies in CommonJS packages [6.115666382910127]
We conduct an empirical study to investigate the bloated dependencies that are entirely unused within server-side applications.
We propose a trace-based dynamic analysis that monitors file access, to determine which dependencies are not accessed during runtime.
Our findings suggest that native support for dependency debloating in package managers could significantly alleviate the burden of maintaining dependencies.
arXiv Detail & Related papers (2024-05-28T08:04:01Z) - See to Believe: Using Visualization To Motivate Updating Third-party Dependencies [1.7914660044009358]
Security vulnerabilities introduced by applications using third-party dependencies are on the increase.
Developers are wary of library updates, even to fix vulnerabilities, citing that being unaware, or that the migration effort to update outweighs the decision.
In this paper, we hypothesize that the dependency graph visualization (DGV) approach will motivate developers to update.
arXiv Detail & Related papers (2024-05-15T03:57:27Z) - Multi-modal Causal Structure Learning and Root Cause Analysis [67.67578590390907]
We propose Mulan, a unified multi-modal causal structure learning method for root cause localization.
We leverage a log-tailored language model to facilitate log representation learning, converting log sequences into time-series data.
We also introduce a novel key performance indicator-aware attention mechanism for assessing modality reliability and co-learning a final causal graph.
arXiv Detail & Related papers (2024-02-04T05:50:38Z) - Less is More? An Empirical Study on Configuration Issues in Python PyPI
Ecosystem [38.44692482370243]
Python is widely used in the open-source community, largely owing to the extensive support from diverse third-party libraries.
Third-party libraries can potentially lead to conflicts in dependencies, prompting researchers to develop dependency conflict detectors.
endeavors have been made to automatically infer dependencies.
arXiv Detail & Related papers (2023-10-19T09:07:51Z) - PyRCA: A Library for Metric-based Root Cause Analysis [66.72542200701807]
PyRCA is an open-source machine learning library of Root Cause Analysis (RCA) for Artificial Intelligence for IT Operations (AIOps)
It provides a holistic framework to uncover the complicated metric causal dependencies and automatically locate root causes of incidents.
arXiv Detail & Related papers (2023-06-20T09:55:10Z) - A Data Source Dependency Analysis Framework for Large Scale Data Science
Projects [0.0]
Data source dependency hell refers to the central role played by data and its unique quirks that often lead to unexpected failures of machine learning models.
We present an automated dependency mapping framework that allows MLOps engineers to monitor the whole dependency map of their models in a fast paced engineering environment.
arXiv Detail & Related papers (2022-12-15T16:34:39Z) - Deep Transfer Learning for Multi-source Entity Linkage via Domain
Adaptation [63.24594955429465]
Multi-source entity linkage is critical in high-impact applications such as data cleaning and user stitching.
AdaMEL is a deep transfer learning framework that learns generic high-level knowledge to perform multi-source entity linkage.
Our framework achieves state-of-the-art results with 8.21% improvement on average over methods based on supervised learning.
arXiv Detail & Related papers (2021-10-27T15:20:41Z) - KILT: a Benchmark for Knowledge Intensive Language Tasks [102.33046195554886]
We present a benchmark for knowledge-intensive language tasks (KILT)
All tasks in KILT are grounded in the same snapshot of Wikipedia.
We find that a shared dense vector index coupled with a seq2seq model is a strong baseline.
arXiv Detail & Related papers (2020-09-04T15:32:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.