Less is More? An Empirical Study on Configuration Issues in Python PyPI
Ecosystem
- URL: http://arxiv.org/abs/2310.12598v2
- Date: Fri, 5 Jan 2024 04:32:12 GMT
- Title: Less is More? An Empirical Study on Configuration Issues in Python PyPI
Ecosystem
- Authors: Yun Peng, Ruida Hu, Ruoke Wang, Cuiyun Gao, Shuqing Li, Michael R. Lyu
- Abstract summary: Python is widely used in the open-source community, largely owing to the extensive support from diverse third-party libraries.
Third-party libraries can potentially lead to conflicts in dependencies, prompting researchers to develop dependency conflict detectors.
endeavors have been made to automatically infer dependencies.
- Score: 38.44692482370243
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Python is widely used in the open-source community, largely owing to the
extensive support from diverse third-party libraries within the PyPI ecosystem.
Nevertheless, the utilization of third-party libraries can potentially lead to
conflicts in dependencies, prompting researchers to develop dependency conflict
detectors. Moreover, endeavors have been made to automatically infer
dependencies. These approaches focus on version-level checks and inference,
based on the assumption that configurations of libraries in the PyPI ecosystem
are correct. However, our study reveals that this assumption is not universally
valid, and relying solely on version-level checks proves inadequate in ensuring
compatible run-time environments. In this paper, we conduct an empirical study
to comprehensively study the configuration issues in the PyPI ecosystem.
Specifically, we propose PyConf, a source-level detector, for detecting
potential configuration issues. PyConf employs three distinct checks, targeting
the setup, packing, and usage stages of libraries, respectively. To evaluate
the effectiveness of the current automatic dependency inference approaches, we
build a benchmark called VLibs, comprising library releases that pass all three
checks of PyConf. We identify 15 kinds of configuration issues and find that
183,864 library releases suffer from potential configuration issues.
Remarkably, 68% of these issues can only be detected via the source-level
check. Our experiment results show that the most advanced automatic dependency
inference approach, PyEGo, can successfully infer dependencies for only 65% of
library releases. The primary failures stem from dependency conflicts and the
absence of required libraries in the generated configurations. Based on the
empirical results, we derive six findings and draw two implications for
open-source developers and future research in automatic dependency inference.
Related papers
- A Preliminary Study on Self-Contained Libraries in the NPM Ecosystem [2.221643499902673]
The widespread of libraries within modern software ecosystems creates complex networks of dependencies.
One mitigation strategy involves reducing dependencies; libraries with zero dependencies become to self-contained.
This paper explores the characteristics of self-contained libraries within the NPM ecosystem.
arXiv Detail & Related papers (2024-06-17T09:33:49Z) - See to Believe: Using Visualization To Motivate Updating Third-party Dependencies [1.7914660044009358]
Security vulnerabilities introduced by applications using third-party dependencies are on the increase.
Developers are wary of library updates, even to fix vulnerabilities, citing that being unaware, or that the migration effort to update outweighs the decision.
In this paper, we hypothesize that the dependency graph visualization (DGV) approach will motivate developers to update.
arXiv Detail & Related papers (2024-05-15T03:57:27Z) - Analyzing the Accessibility of GitHub Repositories for PyPI and NPM Libraries [91.97201077607862]
Industrial applications heavily rely on open-source software (OSS) libraries, which provide various benefits.
To monitor the activities of such communities, a comprehensive list of repositories for the libraries of an ecosystem must be accessible.
In this study, we analyze the accessibility of GitHub repositories for PyPI and NPM libraries.
arXiv Detail & Related papers (2024-04-26T13:27:04Z) - pyvene: A Library for Understanding and Improving PyTorch Models via
Interventions [79.72930339711478]
$textbfpyvene$ is an open-source library that supports customizable interventions on a range of different PyTorch modules.
We show how $textbfpyvene$ provides a unified framework for performing interventions on neural models and sharing the intervened upon models with others.
arXiv Detail & Related papers (2024-03-12T16:46:54Z) - An Empirical Study on Bugs Inside PyTorch: A Replication Study [10.848682558737494]
We characterize bugs in the PyTorch library, a very popular deep learning framework.
Our results highlight that PyTorch bugs are more like traditional software projects bugs, than related to deep learning characteristics.
arXiv Detail & Related papers (2023-07-25T19:23:55Z) - SequeL: A Continual Learning Library in PyTorch and JAX [50.33956216274694]
SequeL is a library for Continual Learning that supports both PyTorch and JAX frameworks.
It provides a unified interface for a wide range of Continual Learning algorithms, including regularization-based approaches, replay-based approaches, and hybrid approaches.
We release SequeL as an open-source library, enabling researchers and developers to easily experiment and extend the library for their own purposes.
arXiv Detail & Related papers (2023-04-21T10:00:22Z) - Repro: An Open-Source Library for Improving the Reproducibility and
Usability of Publicly Available Research Code [74.28810048824519]
Repro is an open-source library which aims at improving the usability of research code.
It provides a lightweight Python API for running software released by researchers within Docker containers.
arXiv Detail & Related papers (2022-04-29T01:54:54Z) - PyHHMM: A Python Library for Heterogeneous Hidden Markov Models [63.01207205641885]
PyHHMM is an object-oriented Python implementation of Heterogeneous-Hidden Markov Models (HHMMs)
PyHHMM emphasizes features not supported in similar available frameworks: a heterogeneous observation model, missing data inference, different model order selection criterias, and semi-supervised training.
PyHHMM relies on the numpy, scipy, scikit-learn, and seaborn Python packages, and is distributed under the Apache-2.0 License.
arXiv Detail & Related papers (2022-01-12T07:32:36Z) - PySAD: A Streaming Anomaly Detection Framework in Python [0.0]
PySAD is an open-source python framework for anomaly detection on streaming data.
PySAD builds upon popular open-source frameworks such as PyOD and scikit-learn.
arXiv Detail & Related papers (2020-09-05T17:41:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.