Characterizing Dependency Update Practice of NPM, PyPI and Cargo Packages
- URL: http://arxiv.org/abs/2403.17382v1
- Date: Tue, 26 Mar 2024 05:01:53 GMT
- Title: Characterizing Dependency Update Practice of NPM, PyPI and Cargo Packages
- Authors: Imranur Rahman, Nusrat Zahan, Stephen Magill, William Enck, Laurie Williams,
- Abstract summary: Keeping dependencies up-to-date prevents software supply chain attacks through outdated and vulnerable dependencies.
We propose two update metrics to measure the updatedness of dependencies and updatedness of vulnerable dependencies.
We conduct a large-scale empirical study of update metrics with 2.9M packages, 66.8M package versions, and 26.8M unique package-dependency relations.
- Score: 7.739923421146855
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Keeping dependencies up-to-date prevents software supply chain attacks through outdated and vulnerable dependencies. Developers may use packages' dependency update practice as one of the selection criteria for choosing a package as a dependency. However, the lack of metrics characterizing packages' dependency update practice makes this assessment difficult. To measure the up-to-date characteristics of packages, we focus on the dependency management aspect and propose two update metrics: Time-Out-Of-Date (TOOD) and Post-Fix-Exposure-Time (PFET), to measure the updatedness of dependencies and updatedness of vulnerable dependencies, respectively. We design an algorithm to stabilize the dependency relationships in different time intervals and compute the proposed metrics for each package. Using our proposed metrics, we conduct a large-scale empirical study of update metrics with 2.9M packages, 66.8M package versions, and 26.8M unique package-dependency relations in NPM, PyPI, and Cargo, ranging from the year 2004 to 2023. We analyze the characteristics of the proposed metrics for capturing packages' dependency update practice in the three ecosystems. Given that the TOOD metric generates a greater volume of data than the PFET metric, we further explore the numerical relationship between these metrics to assess their potential as substitutes for vulnerability counts metrics. We find that PyPI packages update dependencies faster than NPM and Cargo. Conversely, Cargo packages update their vulnerable dependencies faster than NPM and PyPI. We also find that the general purpose update metric, TOOD, can be a proxy for the security-focused update metric, PFET.
Related papers
- Faster Releases, Fewer Risks: A Study on Maven Artifact Vulnerabilities and Lifecycle Management [0.14999444543328289]
We analyze the release histories of 10,000 Maven artifacts, covering over 203,000 releases and 1.7 million dependencies.
Our results show an inverse relationship between release speed and dependency outdatedness.
These findings emphasize the importance of accelerated release strategies in reducing security risks.
arXiv Detail & Related papers (2025-03-31T17:32:45Z) - Analyzing the Usage of Donation Platforms for PyPI Libraries [91.97201077607862]
This study analyzes the adoption of donation platforms in the PyPI ecosystem.
GitHub Sponsors is the dominant platform, though many PyPI-listed links are outdated.
arXiv Detail & Related papers (2025-03-11T10:27:31Z) - Pinning Is Futile: You Need More Than Local Dependency Versioning to Defend against Supply Chain Attacks [23.756533975349985]
Recent high-profile incidents in open-source software have raised practitioner attention on software supply chain attacks.
Security practitioners advocate pinning dependency to specific versions rather than floating in version ranges.
We quantify, through counterfactual analysis and simulations, the security and maintenance impact of version constraints in the npm ecosystem.
arXiv Detail & Related papers (2025-02-10T16:50:48Z) - A Machine Learning-Based Approach For Detecting Malicious PyPI Packages [4.311626046942916]
In modern software development, the use of external libraries and packages is increasingly prevalent.
This reliance on reusing code introduces serious risks for deployed software in the form of malicious packages.
We propose a data-driven approach that uses machine learning and static analysis to examine the package's metadata, code, files, and textual characteristics.
arXiv Detail & Related papers (2024-12-06T18:49:06Z) - The Impact of SBOM Generators on Vulnerability Assessment in Python: A Comparison and a Novel Approach [56.4040698609393]
Software Bill of Materials (SBOM) has been promoted as a tool to increase transparency and verifiability in software composition.
Current SBOM generation tools often suffer from inaccuracies in identifying components and dependencies.
We propose PIP-sbom, a novel pip-inspired solution that addresses their shortcomings.
arXiv Detail & Related papers (2024-09-10T10:12:37Z) - On Characterizing and Mitigating Imbalances in Multi-Instance Partial Label Learning [57.18649648182171]
We make contributions towards addressing a problem that hasn't been studied so far in the context of MI-PLL.
We derive class-specific risk bounds for MI-PLL, while making minimal assumptions.
Our theory reveals a unique phenomenon: that $sigma$ can greatly impact learning imbalances.
arXiv Detail & Related papers (2024-07-13T20:56:34Z) - Leveraging the Crowd for Dependency Management: An Empirical Study on the Dependabot Compatibility Score [3.6840775431698893]
We study the efficacy of the compatibility score to help client packages assess the risks involved with accepting a dependency update.
We find that a compatibility score cannot be calculated for 83% of the dependency updates due to the lack of data from the crowd.
We propose metrics that amplify the input from the crowd and demonstrate the ability of those metrics to predict the acceptance of a successful update by client packages.
arXiv Detail & Related papers (2024-03-14T00:26:19Z) - Malicious Package Detection using Metadata Information [0.272760415353533]
We introduce a metadata-based malicious package detection model, MeMPtec.
MeMPtec extracts a set of features from package metadata information.
Our experiments indicate a significant reduction in both false positives and false negatives.
arXiv Detail & Related papers (2024-02-12T06:54:57Z) - Refined Sample Complexity for Markov Games with Independent Linear Function Approximation [49.5660193419984]
Markov Games (MG) is an important model for Multi-Agent Reinforcement Learning (MARL)
This paper first refines the AVLPR framework by Wang et al. (2023), with an insight of designing pessimistic estimation of the sub-optimality gap.
We give the first algorithm that tackles the curse of multi-agents, attains the optimal $O(T-1/2) convergence rate, and avoids $textpoly(A_max)$ dependency simultaneously.
arXiv Detail & Related papers (2024-02-11T01:51:15Z) - Seg-metrics: a Python package to compute segmentation metrics [0.6827423171182151]
textttseg-metrics is an open-source Python package for standardized MIS model evaluation.
textttseg-metrics supports multiple file formats and is easily installable through the Python Package Index (PyPI)
arXiv Detail & Related papers (2024-01-12T16:30:54Z) - Online non-parametric likelihood-ratio estimation by Pearson-divergence
functional minimization [55.98760097296213]
We introduce a new framework for online non-parametric LRE (OLRE) for the setting where pairs of iid observations $(x_t sim p, x'_t sim q)$ are observed over time.
We provide theoretical guarantees for the performance of the OLRE method along with empirical validation in synthetic experiments.
arXiv Detail & Related papers (2023-11-03T13:20:11Z) - Dependency Practices for Vulnerability Mitigation [4.710141711181836]
We analyze more than 450 vulnerabilities in the npm ecosystem to understand why dependent packages remain vulnerable.
We identify over 200,000 npm packages that are infected through their dependencies.
We use 9 features to build a prediction model that identifies packages that quickly adopt the vulnerability fix and prevent further propagation of vulnerabilities.
arXiv Detail & Related papers (2023-10-11T19:48:46Z) - When is Agnostic Reinforcement Learning Statistically Tractable? [76.1408672715773]
A new complexity measure, called the emphspanning capacity, depends solely on the set $Pi$ and is independent of the MDP dynamics.
We show there exists a policy class $Pi$ with a bounded spanning capacity that requires a superpolynomial number of samples to learn.
This reveals a surprising separation for learnability between generative access and online access models.
arXiv Detail & Related papers (2023-10-09T19:40:54Z) - Dependency Update Strategies and Package Characteristics [5.119787101452765]
This study explores the association between package characteristics and the dependency update strategy selected by its dependents.
We study over 112,000 npm packages and use 19 characteristics to build a prediction model that identifies the common dependency update strategy for each package.
arXiv Detail & Related papers (2023-05-25T02:58:21Z) - Joint Metrics Matter: A Better Standard for Trajectory Forecasting [67.1375677218281]
Multi-modal trajectory forecasting methods evaluate using single-agent metrics (marginal metrics)
Only focusing on marginal metrics can lead to unnatural predictions, such as colliding trajectories or diverging trajectories for people who are clearly walking together as a group.
We present the first comprehensive evaluation of state-of-the-art trajectory forecasting methods with respect to multi-agent metrics (joint metrics): JADE, JFDE, and collision rate.
arXiv Detail & Related papers (2023-05-10T16:27:55Z) - Bilateral Dependency Optimization: Defending Against Model-inversion
Attacks [61.78426165008083]
We propose a bilateral dependency optimization (BiDO) strategy to defend against model-inversion attacks.
BiDO achieves the state-of-the-art defense performance for a variety of datasets, classifiers, and MI attacks.
arXiv Detail & Related papers (2022-06-11T10:07:03Z) - QAFactEval: Improved QA-Based Factual Consistency Evaluation for
Summarization [116.56171113972944]
We show that carefully choosing the components of a QA-based metric is critical to performance.
Our solution improves upon the best-performing entailment-based metric and achieves state-of-the-art performance.
arXiv Detail & Related papers (2021-12-16T00:38:35Z) - Linguistic dependencies and statistical dependence [76.89273585568084]
We use pretrained language models to estimate probabilities of words in context.
We find that maximum-CPMI trees correspond to linguistic dependencies more often than trees extracted from non-contextual PMI estimate.
arXiv Detail & Related papers (2021-04-18T02:43:37Z) - Neural Methods for Point-wise Dependency Estimation [129.93860669802046]
We focus on estimating point-wise dependency (PD), which quantitatively measures how likely two outcomes co-occur.
We demonstrate the effectiveness of our approaches in 1) MI estimation, 2) self-supervised representation learning, and 3) cross-modal retrieval task.
arXiv Detail & Related papers (2020-06-09T23:26:15Z) - pyBART: Evidence-based Syntactic Transformations for IE [52.93947844555369]
We present pyBART, an easy-to-use open-source Python library for converting English UD trees to Enhanced UD graphs or to our representation.
When evaluated in a pattern-based relation extraction scenario, our representation results in higher extraction scores than Enhanced UD, while requiring fewer patterns.
arXiv Detail & Related papers (2020-05-04T07:38:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.