How Deep Does Your Dependency Tree Go? An Empirical Study of Dependency Amplification Across 10 Package Ecosystems
- URL: http://arxiv.org/abs/2512.14739v1
- Date: Fri, 12 Dec 2025 05:53:32 GMT
- Title: How Deep Does Your Dependency Tree Go? An Empirical Study of Dependency Amplification Across 10 Package Ecosystems
- Authors: Jahidul Arafat,
- Abstract summary: We study 500 projects across ten major ecosystems, including Maven, npm Registry, RubyGems, Go Modules, PyPI, CocoaPods, and Pub.<n>We find significant differences with large effect sizes in 22 of 45 pairwise comparisons, challenging the assumption that npm has the highest amplification.<n>Our findings suggest adopting ecosystem specific security strategies, including systematic auditing for Maven environments, targeted outlier detection for npm and RubyGems, and continuation of current practices for ecosystems with controlled amplification.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern software development relies on package ecosystems where a single declared dependency can pull in many additional transitive packages. This dependency amplification, defined as the ratio of transitive to direct dependencies, has major implications for software supply chain security, yet amplification patterns across ecosystems have not been compared at scale. We present an empirical study of 500 projects across ten major ecosystems, including Maven Central for Java, npm Registry for JavaScript, crates io for Rust, PyPI for Python, NuGet Gallery for dot NET, RubyGems for Ruby, Go Modules for Go, Packagist for PHP, CocoaPods for Swift and Objective C, and Pub for Dart. Our analysis shows that Maven exhibits mean amplification of 24.70 times, compared to 4.48 times for Go Modules, 4.32 times for npm, and 0.32 times for CocoaPods. We find significant differences with large effect sizes in 22 of 45 pairwise comparisons, challenging the assumption that npm has the highest amplification due to its many small purpose packages. We observe that 28 percent of Maven projects exceed 10 times amplification, indicating a systematic pattern rather than isolated outliers, compared to 14 percent for RubyGems, 12 percent for npm, and zero percent for Cargo, PyPI, Packagist, CocoaPods, and Pub. We attribute these differences to ecosystem design choices such as dependency resolution behavior, standard library completeness, and platform constraints. Our findings suggest adopting ecosystem specific security strategies, including systematic auditing for Maven environments, targeted outlier detection for npm and RubyGems, and continuation of current practices for ecosystems with controlled amplification. We provide a full replication package with data and analysis scripts.
Related papers
- Why Authors and Maintainers Link (or Don't Link) Their PyPI Libraries to Code Repositories and Donation Platforms [83.16077040470975]
Metadata of libraries on the Python Package Index (PyPI) plays a critical role in supporting the transparency, trust, and sustainability of open-source libraries.<n>This paper presents a large-scale empirical study combining two targeted surveys sent to 50,000 PyPI authors and maintainers.<n>We analyze more than 1,400 responses using large language model (LLM)-based topic modeling to uncover key motivations and barriers related to linking repositories and donation platforms.
arXiv Detail & Related papers (2026-01-21T16:13:57Z) - PyPitfall: Dependency Chaos and Software Supply Chain Vulnerabilities in Python [1.2644387713029346]
This paper introduces PyPitfall, a quantitative analysis of vulnerable dependencies across the PyPI ecosystem.<n>We analyzed the dependency structures of 378,573 PyPI packages and identified 4,655 packages that explicitly require at least one known-vulnerable version.<n>We aim to raise awareness of Python software supply chain security by characterizing the ecosystem-wide dependency landscape.
arXiv Detail & Related papers (2025-07-24T03:58:18Z) - EnvBench: A Benchmark for Automated Environment Setup [76.02998475135824]
Large Language Models have enabled researchers to focus on practical repository-level tasks in software engineering domain.<n>Existing studies on environment setup introduce innovative agentic strategies, but their evaluation is often based on small datasets.<n>To address this gap, we introduce a comprehensive environment setup benchmark EnvBench.
arXiv Detail & Related papers (2025-03-18T17:19:12Z) - PyPulse: A Python Library for Biosignal Imputation [58.35269251730328]
We introduce PyPulse, a Python package for imputation of biosignals in both clinical and wearable sensor settings.<n>PyPulse's framework provides a modular and extendable framework with high ease-of-use for a broad userbase, including non-machine-learning bioresearchers.<n>We released PyPulse under the MIT License on Github and PyPI.
arXiv Detail & Related papers (2024-12-09T11:00:55Z) - The Stackage Repository: An Exploratory Study of its Evolution [0.0]
This paper conducts empirical research about the evolution of Stackage considering monad packages.
To the best of our knowledge, this is the first large-scale analysis of the evolution of the Stackage repository regarding packages used and monads.
arXiv Detail & Related papers (2023-10-16T23:42:47Z) - On the Feasibility of Cross-Language Detection of Malicious Packages in
npm and PyPI [6.935278888313423]
Malicious users started to spread malware by publishing open-source packages containing malicious code.
Recent works apply machine learning techniques to detect malicious packages in the npm ecosystem.
We present a novel approach that involves a set of language-independent features and the training of models capable of detecting malicious packages in npm and PyPI.
arXiv Detail & Related papers (2023-10-14T12:32:51Z) - Analysis of Library Dependency Networks of Package Managers Used in iOS
Development [3.46067608522128]
The library dependency network in the Swift ecosystem encompasses libraries from CocoaPods, Carthage and Swift Package Manager (PM)
Although CocoaPods is the package manager with the biggest set of libraries, the difference to other package managers is not as big as expected.
Swift PM is becoming more and more popular, resulting in a gradual slow-down of the growth of the other two package managers.
arXiv Detail & Related papers (2023-05-18T12:14:19Z) - DADApy: Distance-based Analysis of DAta-manifolds in Python [51.37841707191944]
DADApy is a python software package for analysing and characterising high-dimensional data.
It provides methods for estimating the intrinsic dimension and the probability density, for performing density-based clustering and for comparing different distance metrics.
arXiv Detail & Related papers (2022-05-04T08:41:59Z) - PyHHMM: A Python Library for Heterogeneous Hidden Markov Models [63.01207205641885]
PyHHMM is an object-oriented Python implementation of Heterogeneous-Hidden Markov Models (HHMMs)
PyHHMM emphasizes features not supported in similar available frameworks: a heterogeneous observation model, missing data inference, different model order selection criterias, and semi-supervised training.
PyHHMM relies on the numpy, scipy, scikit-learn, and seaborn Python packages, and is distributed under the Apache-2.0 License.
arXiv Detail & Related papers (2022-01-12T07:32:36Z) - Scikit-dimension: a Python package for intrinsic dimension estimation [58.8599521537]
This technical note introduces textttscikit-dimension, an open-source Python package for intrinsic dimension estimation.
textttscikit-dimension package provides a uniform implementation of most of the known ID estimators based on scikit-learn application programming interface.
We briefly describe the package and demonstrate its use in a large-scale (more than 500 datasets) benchmarking of methods for ID estimation in real-life and synthetic data.
arXiv Detail & Related papers (2021-09-06T16:46:38Z) - An Empirical Analysis of the R Package Ecosystem [0.0]
We analyze more than 25,000 packages, 150,000 releases, and 15 million files across two decades.
We find that the historical growth of the ecosystem has been robust under all measures.
arXiv Detail & Related papers (2021-02-19T12:55:18Z) - pyBART: Evidence-based Syntactic Transformations for IE [52.93947844555369]
We present pyBART, an easy-to-use open-source Python library for converting English UD trees to Enhanced UD graphs or to our representation.
When evaluated in a pattern-based relation extraction scenario, our representation results in higher extraction scores than Enhanced UD, while requiring fewer patterns.
arXiv Detail & Related papers (2020-05-04T07:38:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.