Analyzing the Accessibility of GitHub Repositories for PyPI and NPM Libraries
- URL: http://arxiv.org/abs/2404.17403v1
- Date: Fri, 26 Apr 2024 13:27:04 GMT
- Title: Analyzing the Accessibility of GitHub Repositories for PyPI and NPM Libraries
- Authors: Alexandros Tsakpinis, Alexander Pretschner,
- Abstract summary: Industrial applications heavily rely on open-source software (OSS) libraries, which provide various benefits.
To monitor the activities of such communities, a comprehensive list of repositories for the libraries of an ecosystem must be accessible.
In this study, we analyze the accessibility of GitHub repositories for PyPI and NPM libraries.
- Score: 91.97201077607862
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Industrial applications heavily rely on open-source software (OSS) libraries, which provide various benefits. But, they can also present a substantial risk if a vulnerability or attack arises and the community fails to promptly address the issue and release a fix due to inactivity. To be able to monitor the activities of such communities, a comprehensive list of repositories for the libraries of an ecosystem must be accessible. Based on these repositories, integrated libraries of an application can be monitored to observe whether they are adequately maintained. In this descriptive study, we analyze the accessibility of GitHub repositories for PyPI and NPM libraries. For all available libraries, we extract assigned repository URLs, direct dependencies and use the page rank algorithm to comprehensively analyze the ecosystems from a library and dependency chain perspective. For invalid repository URLs, we derive potential reasons. Both ecosystems show varying accessibility to GitHub repository URLs, depending on the page rank score of the analyzed libraries. For individual libraries, up to 73.8% of PyPI and up to 69.4% of NPM libraries have repository URLs. Within dependency chains, up to 80.1% of PyPI libraries have URLs, while up to 81.1% for NPM. That means, most libraries, especially the ones of increasing importance, can be monitored on GitHub. Among the most common reasons for invalid repository URLs is no URLs being assigned at all, which amounts up to 17.9% for PyPI and up to 39.6% for NPM. Package maintainers should address this issue and update the repository information to enable monitoring of their libraries.
Related papers
- Contributing Back to the Ecosystem: A User Survey of NPM Developers [10.154686574810501]
Survey of 49 developers from the NPM ecosystem.
We find that developers are more likely to maintain their own packages rather than contribute to the ecosystem.
Our results open up new avenues into tool support and research into how to sustain these ecosystems.
arXiv Detail & Related papers (2024-07-01T00:15:55Z) - A Preliminary Study on Self-Contained Libraries in the NPM Ecosystem [2.221643499902673]
The widespread of libraries within modern software ecosystems creates complex networks of dependencies.
One mitigation strategy involves reducing dependencies; libraries with zero dependencies become to self-contained.
This paper explores the characteristics of self-contained libraries within the NPM ecosystem.
arXiv Detail & Related papers (2024-06-17T09:33:49Z) - How to Understand Whole Software Repository? [64.19431011897515]
An excellent understanding of the whole repository will be the critical path to Automatic Software Engineering (ASE)
We develop a novel method named RepoUnderstander by guiding agents to comprehensively understand the whole repositories.
To better utilize the repository-level knowledge, we guide the agents to summarize, analyze, and plan.
arXiv Detail & Related papers (2024-06-03T15:20:06Z) - PyRadar: Towards Automatically Retrieving and Validating Source Code Repository Information for PyPI Packages [24.8919191161202]
Existing tools can only retrieve repository information for up to 70.5% of PyPI releases.
This paper proposes PyRadar, a novel framework that utilizes the metadata and source distribution to retrieve and validate the repository information for PyPI releases.
arXiv Detail & Related papers (2024-04-25T12:27:59Z) - Less is More? An Empirical Study on Configuration Issues in Python PyPI
Ecosystem [38.44692482370243]
Python is widely used in the open-source community, largely owing to the extensive support from diverse third-party libraries.
Third-party libraries can potentially lead to conflicts in dependencies, prompting researchers to develop dependency conflict detectors.
endeavors have been made to automatically infer dependencies.
arXiv Detail & Related papers (2023-10-19T09:07:51Z) - torchgfn: A PyTorch GFlowNet library [56.071033896777784]
torchgfn is a PyTorch library that aims to address this need.
It provides users with a simple API for environments and useful abstractions for samplers and losses.
arXiv Detail & Related papers (2023-05-24T00:20:59Z) - Vulnerability Propagation in Package Managers Used in iOS Development [2.9280059958992286]
Vulnerabilities may be found even in well-known libraries.
The library dependency network in the Swift ecosystem encompasses libraries from CocoaPods, Carthage and Swift Package Manager.
Although most libraries with publicly reported vulnerabilities are written in C, the highest impact of publicly reported vulnerabilities originated from libraries written in native iOS languages.
arXiv Detail & Related papers (2023-05-17T16:22:38Z) - Code Librarian: A Software Package Recommendation System [65.05559087332347]
We present a recommendation engine called Librarian for open source libraries.
A candidate library package is recommended for a given context if: 1) it has been frequently used with the imported libraries in the program; 2) it has similar functionality to the imported libraries in the program; 3) it has similar functionality to the developer's implementation, and 4) it can be used efficiently in the context of the provided code.
arXiv Detail & Related papers (2022-10-11T12:30:05Z) - Repro: An Open-Source Library for Improving the Reproducibility and
Usability of Publicly Available Research Code [74.28810048824519]
Repro is an open-source library which aims at improving the usability of research code.
It provides a lightweight Python API for running software released by researchers within Docker containers.
arXiv Detail & Related papers (2022-04-29T01:54:54Z) - PyTorchVideo: A Deep Learning Library for Video Understanding [71.89124881732015]
PyTorchVideo is an open-source deep-learning library for video understanding tasks.
It covers a full stack of video understanding tools including multimodal data loading, transformations, and models.
The library is based on PyTorch and can be used by any training framework.
arXiv Detail & Related papers (2021-11-18T18:59:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.