Related papers: Less is More? An Empirical Study on Configuration Issues in Python PyPI Ecosystem

Less is More? An Empirical Study on Configuration Issues in Python PyPI Ecosystem

URL: http://arxiv.org/abs/2310.12598v2
Date: Fri, 5 Jan 2024 04:32:12 GMT
Title: Less is More? An Empirical Study on Configuration Issues in Python PyPI Ecosystem
Authors: Yun Peng, Ruida Hu, Ruoke Wang, Cuiyun Gao, Shuqing Li, Michael R. Lyu
Abstract summary: Python is widely used in the open-source community, largely owing to the extensive support from diverse third-party libraries. Third-party libraries can potentially lead to conflicts in dependencies, prompting researchers to develop dependency conflict detectors. endeavors have been made to automatically infer dependencies.
Score: 38.44692482370243
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Python is widely used in the open-source community, largely owing to the extensive support from diverse third-party libraries within the PyPI ecosystem. Nevertheless, the utilization of third-party libraries can potentially lead to conflicts in dependencies, prompting researchers to develop dependency conflict detectors. Moreover, endeavors have been made to automatically infer dependencies. These approaches focus on version-level checks and inference, based on the assumption that configurations of libraries in the PyPI ecosystem are correct. However, our study reveals that this assumption is not universally valid, and relying solely on version-level checks proves inadequate in ensuring compatible run-time environments. In this paper, we conduct an empirical study to comprehensively study the configuration issues in the PyPI ecosystem. Specifically, we propose PyConf, a source-level detector, for detecting potential configuration issues. PyConf employs three distinct checks, targeting the setup, packing, and usage stages of libraries, respectively. To evaluate the effectiveness of the current automatic dependency inference approaches, we build a benchmark called VLibs, comprising library releases that pass all three checks of PyConf. We identify 15 kinds of configuration issues and find that 183,864 library releases suffer from potential configuration issues. Remarkably, 68% of these issues can only be detected via the source-level check. Our experiment results show that the most advanced automatic dependency inference approach, PyEGo, can successfully infer dependencies for only 65% of library releases. The primary failures stem from dependency conflicts and the absence of required libraries in the generated configurations. Based on the empirical results, we derive six findings and draw two implications for open-source developers and future research in automatic dependency inference.

Related papers

PyPitfall: Dependency Chaos and Software Supply Chain Vulnerabilities in Python [1.2644387713029346]
This paper introduces PyPitfall, a quantitative analysis of vulnerable dependencies across the PyPI ecosystem.<n>We analyzed the dependency structures of 378,573 PyPI packages and identified 4,655 packages that explicitly require at least one known-vulnerable version.<n>We aim to raise awareness of Python software supply chain security by characterizing the ecosystem-wide dependency landscape.
arXiv Detail & Related papers (2025-07-24T03:58:18Z)
How Robust are LLM-Generated Library Imports? An Empirical Study using Stack Overflow [3.076436880934678]
We conduct an empirical study of six Large Language Models (LLMs)<n>We analyze the types of libraries they import, the characteristics of those libraries, and the extent to which the recommendations are usable out of the box.<n>Our results show that LLMs predominantly favour third-party libraries over standard ones, and often recommend mature, popular, and permissively licensed dependencies.
arXiv Detail & Related papers (2025-07-14T21:35:29Z)
Analyzing the Usage of Donation Platforms for PyPI Libraries [91.97201077607862]
This study analyzes the adoption of donation platforms in the PyPI ecosystem. GitHub Sponsors is the dominant platform, though many PyPI-listed links are outdated.
arXiv Detail & Related papers (2025-03-11T10:27:31Z)
Raiders of the Lost Dependency: Fixing Dependency Conflicts in Python using LLMs [10.559292676550319]
Python developers must manually identify and resolve environment dependencies and version constraints of third-party modules and Python interpreters. Traditional approaches face limitations due to the variety of dependency error types, large sets of possible module versions, and conflicts among. This study explores the potential of using large language models (LLMs) to automatically fix dependency issues in Python programs.
arXiv Detail & Related papers (2025-01-27T16:45:34Z)
PyPulse: A Python Library for Biosignal Imputation [58.35269251730328]
We introduce PyPulse, a Python package for imputation of biosignals in both clinical and wearable sensor settings. PyPulse's framework provides a modular and extendable framework with high ease-of-use for a broad userbase, including non-machine-learning bioresearchers. We released PyPulse under the MIT License on Github and PyPI.
arXiv Detail & Related papers (2024-12-09T11:00:55Z)
A Preliminary Study on Self-Contained Libraries in the NPM Ecosystem [2.221643499902673]
The widespread of libraries within modern software ecosystems creates complex networks of dependencies. One mitigation strategy involves reducing dependencies; libraries with zero dependencies become to self-contained. This paper explores the characteristics of self-contained libraries within the NPM ecosystem.
arXiv Detail & Related papers (2024-06-17T09:33:49Z)
See to Believe: Using Visualization To Motivate Updating Third-party Dependencies [1.7914660044009358]
Security vulnerabilities introduced by applications using third-party dependencies are on the increase. Developers are wary of library updates, even to fix vulnerabilities, citing that being unaware, or that the migration effort to update outweighs the decision. In this paper, we hypothesize that the dependency graph visualization (DGV) approach will motivate developers to update.
arXiv Detail & Related papers (2024-05-15T03:57:27Z)
Analyzing the Accessibility of GitHub Repositories for PyPI and NPM Libraries [91.97201077607862]
Industrial applications heavily rely on open-source software (OSS) libraries, which provide various benefits. To monitor the activities of such communities, a comprehensive list of repositories for the libraries of an ecosystem must be accessible. In this study, we analyze the accessibility of GitHub repositories for PyPI and NPM libraries.
arXiv Detail & Related papers (2024-04-26T13:27:04Z)
pyvene: A Library for Understanding and Improving PyTorch Models via Interventions [79.72930339711478]
$textbfpyvene$ is an open-source library that supports customizable interventions on a range of different PyTorch modules. We show how $textbfpyvene$ provides a unified framework for performing interventions on neural models and sharing the intervened upon models with others.
arXiv Detail & Related papers (2024-03-12T16:46:54Z)
An Empirical Study on Bugs Inside PyTorch: A Replication Study [10.848682558737494]
We characterize bugs in the PyTorch library, a very popular deep learning framework. Our results highlight that PyTorch bugs are more like traditional software projects bugs, than related to deep learning characteristics.
arXiv Detail & Related papers (2023-07-25T19:23:55Z)
SequeL: A Continual Learning Library in PyTorch and JAX [50.33956216274694]
SequeL is a library for Continual Learning that supports both PyTorch and JAX frameworks. It provides a unified interface for a wide range of Continual Learning algorithms, including regularization-based approaches, replay-based approaches, and hybrid approaches. We release SequeL as an open-source library, enabling researchers and developers to easily experiment and extend the library for their own purposes.
arXiv Detail & Related papers (2023-04-21T10:00:22Z)
Repro: An Open-Source Library for Improving the Reproducibility and Usability of Publicly Available Research Code [74.28810048824519]
Repro is an open-source library which aims at improving the usability of research code. It provides a lightweight Python API for running software released by researchers within Docker containers.
arXiv Detail & Related papers (2022-04-29T01:54:54Z)
PyHHMM: A Python Library for Heterogeneous Hidden Markov Models [63.01207205641885]
PyHHMM is an object-oriented Python implementation of Heterogeneous-Hidden Markov Models (HHMMs) PyHHMM emphasizes features not supported in similar available frameworks: a heterogeneous observation model, missing data inference, different model order selection criterias, and semi-supervised training. PyHHMM relies on the numpy, scipy, scikit-learn, and seaborn Python packages, and is distributed under the Apache-2.0 License.
arXiv Detail & Related papers (2022-01-12T07:32:36Z)
PySAD: A Streaming Anomaly Detection Framework in Python [0.0]
PySAD is an open-source python framework for anomaly detection on streaming data. PySAD builds upon popular open-source frameworks such as PyOD and scikit-learn.
arXiv Detail & Related papers (2020-09-05T17:41:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.