PVAC: Package Version Activity Categorizer, Leveraging Semantic Versioning in a Heterogeneous System
- URL: http://arxiv.org/abs/2409.04588v2
- Date: Sat, 31 May 2025 19:38:53 GMT
- Title: PVAC: Package Version Activity Categorizer, Leveraging Semantic Versioning in a Heterogeneous System
- Authors: Shane K. Panter, Luke Hindman, Nasir U. Eisty,
- Abstract summary: This research aims to introduce a systematic method and a prototype tool for assessing version activity within heterogeneous package manager ecosystems.<n>We developed a Package Version Activity Categorizer (PVAC) that consists of three components.<n>PVAC parses semantic versioning details from diverse package version strings, enabling consistent categorization and quantitative scoring of version changes.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Context: Modern open-source software ecosystems, such as those managed by GNU/Linux distributions, are composed of numerous packages developed independently by diverse communities. These ecosystems employ package management tools to facilitate software installation and dependency resolution. However, these tools lack robust mechanisms for systematically evaluating the development activity and versioning dynamics within their heterogeneous software environments. Objective: This research aims to introduce a systematic method and a prototype tool for assessing version activity within heterogeneous package manager ecosystems, enabling quantitative analysis of software package updates. Method: We developed a Package Version Activity Categorizer (PVAC) that consists of three components. The Version Categorizer (VC), which categorizes diverse semantic version numbers, a Version Number Delta (VND) component, which calculates a numeric score representing the aggregated semantic version changes across packages at the ecosystem level, and finally, an Activity Categorizer (AC) that categorizes the activity of individual packages within that ecosystem. PVAC utilizes tailored regular expressions to parse semantic versioning details (epoch, major, minor, and patch versions) from diverse package version strings, enabling consistent categorization and quantitative scoring of version changes. Results: PVAC was empirically evaluated using a dataset of 22,535 packages drawn from recent releases of Debian and Ubuntu GNU/Linux distributions. Our findings demonstrate PVAC's effectiveness for accurately categorizing versioning schemes and quantitatively measuring version activity across releases. We provide empirical evidence confirming that semantic versioning, including adapted variations, is predominantly employed across these ecosystems.
Related papers
- Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo [90.78001821963008]
A wide range of LM applications require generating text that conforms to syntactic or semantic constraints.
We develop an architecture for controlled LM generation based on sequential Monte Carlo (SMC)
Our system builds on the framework of Lew et al. (2023) and integrates with its language model probabilistic programming language.
arXiv Detail & Related papers (2025-04-17T17:49:40Z) - Dependency Update Adoption Patterns in the Maven Software Ecosystem [0.0]
dependency updates protect dependent software components from bugs, security vulnerabilities, and poor code quality.<n>We find adoption latency in the Maven ecosystem follows a log-normal distribution while adoption reach exhibits an exponential decay distribution.
arXiv Detail & Related papers (2025-04-09T22:24:31Z) - SoK: Towards Reproducibility for Software Packages in Scripting Language Ecosystems [0.0]
This SoK provides an overview of existing research, aiming to highlight future directions.<n>We work out key aspects in current research, systematize identified challenges for software, and map them between the ecosystems.<n>We find that the literature is sparse, focusing on few individual problems and ecosystems.
arXiv Detail & Related papers (2025-03-27T17:10:38Z) - EnvBench: A Benchmark for Automated Environment Setup [76.02998475135824]
Large Language Models have enabled researchers to focus on practical repository-level tasks in software engineering domain.
Existing studies on environment setup introduce innovative agentic strategies, but their evaluation is often based on small datasets.
To address this gap, we introduce a comprehensive environment setup benchmark EnvBench.
arXiv Detail & Related papers (2025-03-18T17:19:12Z) - Rethinking Reuse in Dependency Supply Chains: Initial Analysis of NPM packages at the End of the Chain [2.4969046521751768]
This paper advocates for a shift in software development practices toward minimizing reliance on third-party packages.
We find that these end-of-chain packages offer unique insights, as they play a key role in the ecosystem.
arXiv Detail & Related papers (2025-03-04T17:26:34Z) - Structural Entropy Guided Probabilistic Coding [52.01765333755793]
We propose a novel structural entropy-guided probabilistic coding model, named SEPC.<n>We incorporate the relationship between latent variables into the optimization by proposing a structural entropy regularization loss.<n> Experimental results across 12 natural language understanding tasks, including both classification and regression tasks, demonstrate the superior performance of SEPC.
arXiv Detail & Related papers (2024-12-12T00:37:53Z) - Measuring Software Innovation with Open Source Software Development Data [0.0]
This paper introduces a novel measure of software innovation based on open source software (OSS) development activity on GitHub.<n>We examine the dependency growth and release complexity among 350,000 unique releases from 33,000 unique packages across the JavaScript, Python, and Ruby ecosystems over two years post-release.
arXiv Detail & Related papers (2024-11-07T19:11:32Z) - A First Look at Package-to-Group Mechanism: An Empirical Study of the Linux Distributions [20.491275902894273]
A package-to-group mechanism (P2G) is employed to enable unified installation, uninstallation, and updates of multiple packages at once.
This paper takes Linux distributions as a case study and presents an empirical study focusing on its application trends, evolutionary patterns, group quality, and developer tendencies.
arXiv Detail & Related papers (2024-10-14T03:48:20Z) - Learning Multi-Aspect Item Palette: A Semantic Tokenization Framework for Generative Recommendation [55.99632509895994]
We introduce LAMIA, a novel approach for multi-aspect semantic tokenization.<n>Unlike RQ-VAE, which uses a single embedding, LAMIA learns an item palette''--a collection of independent and semantically parallel embeddings.<n>Our results demonstrate significant improvements in recommendation accuracy over existing methods.
arXiv Detail & Related papers (2024-09-11T13:49:48Z) - Uncovering and Mitigating the Impact of Frozen Package Versions for Fixed-Release Linux [38.53185042161599]
We study the ecosystem gap of fixed-release Linux caused by the evolution of mirrors.
We propose a novel package management approach allowing for separate dependency environments based on native Debian mirrors.
We present a working prototype, named ccenv, which can effectively remedy the inadequacy of current tools.
arXiv Detail & Related papers (2024-08-21T14:01:46Z) - VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models [89.63342806812413]
We present an open-source toolkit for evaluating large multi-modality models based on PyTorch.
VLMEvalKit implements over 70 different large multi-modality models, including both proprietary APIs and open-source models.
We host OpenVLM Leaderboard to track the progress of multi-modality learning research.
arXiv Detail & Related papers (2024-07-16T13:06:15Z) - How to Understand Whole Software Repository? [64.19431011897515]
An excellent understanding of the whole repository will be the critical path to Automatic Software Engineering (ASE)
We develop a novel method named RepoUnderstander by guiding agents to comprehensively understand the whole repositories.
To better utilize the repository-level knowledge, we guide the agents to summarize, analyze, and plan.
arXiv Detail & Related papers (2024-06-03T15:20:06Z) - Malicious Package Detection using Metadata Information [0.272760415353533]
We introduce a metadata-based malicious package detection model, MeMPtec.
MeMPtec extracts a set of features from package metadata information.
Our experiments indicate a significant reduction in both false positives and false negatives.
arXiv Detail & Related papers (2024-02-12T06:54:57Z) - Analyzing the Evolution of Inter-package Dependencies in Operating
Systems: A Case Study of Ubuntu [7.76541950830141]
An Operating System (OS) combines multiple interdependent software packages, which usually have their own independently developed architectures.
For an evolutionary effort, designers/developers of OS can greatly benefit from fully understanding the system-wide dependency focused on individual files.
We propose a framework, DepEx, aimed at discovering the detailed package relations at the level of individual binary files.
arXiv Detail & Related papers (2023-07-10T10:12:21Z) - Benchmarking Test-Time Adaptation against Distribution Shifts in Image
Classification [77.0114672086012]
Test-time adaptation (TTA) is a technique aimed at enhancing the generalization performance of models by leveraging unlabeled samples solely during prediction.
We present a benchmark that systematically evaluates 13 prominent TTA methods and their variants on five widely used image classification datasets.
arXiv Detail & Related papers (2023-07-06T16:59:53Z) - Promises and Perils of Mining Software Package Ecosystem Data [10.787686237395816]
Third-party packages have led to the emergence of large software package ecosystems with a maze of inter-dependencies.
Understanding the infrastructure and dynamics of package ecosystems has given rise to approaches for better code reuse, automated updates, and the avoidance of vulnerabilities.
In this chapter, we review promises and perils of mining the rich data related to software package ecosystems available to software engineering researchers.
arXiv Detail & Related papers (2023-05-29T03:09:48Z) - ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented
Visual Models [102.63817106363597]
We build ELEVATER, the first benchmark to compare and evaluate pre-trained language-augmented visual models.
It consists of 20 image classification datasets and 35 object detection datasets, each of which is augmented with external knowledge.
We will release our toolkit and evaluation platforms for the research community.
arXiv Detail & Related papers (2022-04-19T10:23:42Z) - pymdp: A Python library for active inference in discrete state spaces [52.85819390191516]
pymdp is an open-source package for simulating active inference in Python.
We provide the first open-source package for simulating active inference with POMDPs.
arXiv Detail & Related papers (2022-01-11T12:18:44Z) - Extending the WILDS Benchmark for Unsupervised Adaptation [186.90399201508953]
We present the WILDS 2.0 update, which extends 8 of the 10 datasets in the WILDS benchmark of distribution shifts to include curated unlabeled data.
These datasets span a wide range of applications (from histology to wildlife conservation), tasks (classification, regression, and detection), and modalities.
We systematically benchmark state-of-the-art methods that leverage unlabeled data, including domain-invariant, self-training, and self-supervised methods.
arXiv Detail & Related papers (2021-12-09T18:32:38Z) - An Empirical Analysis of the R Package Ecosystem [0.0]
We analyze more than 25,000 packages, 150,000 releases, and 15 million files across two decades.
We find that the historical growth of the ecosystem has been robust under all measures.
arXiv Detail & Related papers (2021-02-19T12:55:18Z) - WILDS: A Benchmark of in-the-Wild Distribution Shifts [157.53410583509924]
Distribution shifts can substantially degrade the accuracy of machine learning systems deployed in the wild.
We present WILDS, a curated collection of 8 benchmark datasets that reflect a diverse range of distribution shifts.
We show that standard training results in substantially lower out-of-distribution than in-distribution performance.
arXiv Detail & Related papers (2020-12-14T11:14:56Z) - DIETERpy: a Python framework for The Dispatch and Investment Evaluation
Tool with Endogenous Renewables [62.997667081978825]
DIETER is an open-source power sector model designed to analyze future settings with very high shares of variable renewable energy sources.
It minimizes overall system costs, including fixed and variable costs of various generation, flexibility and sector coupling options.
We introduce DIETERpy that builds on the existing model version, written in the General Algebraic Modeling System (GAMS) and enhances it with a Python framework.
arXiv Detail & Related papers (2020-10-02T09:27:33Z) - Instance-Aware Graph Convolutional Network for Multi-Label
Classification [55.131166957803345]
Graph convolutional neural network (GCN) has effectively boosted the multi-label image recognition task.
We propose an instance-aware graph convolutional neural network (IA-GCN) framework for multi-label classification.
arXiv Detail & Related papers (2020-08-19T12:49:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.