Promises and Perils of Mining Software Package Ecosystem Data
- URL: http://arxiv.org/abs/2306.10021v1
- Date: Mon, 29 May 2023 03:09:48 GMT
- Title: Promises and Perils of Mining Software Package Ecosystem Data
- Authors: Raula Gaikovina Kula, Katsuro Inoue, and Christoph Treude
- Abstract summary: Third-party packages have led to the emergence of large software package ecosystems with a maze of inter-dependencies.
Understanding the infrastructure and dynamics of package ecosystems has given rise to approaches for better code reuse, automated updates, and the avoidance of vulnerabilities.
In this chapter, we review promises and perils of mining the rich data related to software package ecosystems available to software engineering researchers.
- Score: 10.787686237395816
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The use of third-party packages is becoming increasingly popular and has led
to the emergence of large software package ecosystems with a maze of
inter-dependencies. Since the reliance on these ecosystems enables developers
to reduce development effort and increase productivity, it has attracted the
interest of researchers: understanding the infrastructure and dynamics of
package ecosystems has given rise to approaches for better code reuse,
automated updates, and the avoidance of vulnerabilities, to name a few
examples. But the reality of these ecosystems also poses challenges to software
engineering researchers, such as: How do we obtain the complete network of
dependencies along with the corresponding versioning information? What are the
boundaries of these package ecosystems? How do we consistently detect
dependencies that are declared but not used? How do we consistently identify
developers within a package ecosystem? How much of the ecosystem do we need to
understand to analyse a single component? How well do our approaches generalise
across different programming languages and package ecosystems? In this chapter,
we review promises and perils of mining the rich data related to software
package ecosystems available to software engineering researchers.
Related papers
- RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph [63.87660059104077]
We present RepoGraph, a plug-in module that manages a repository-level structure for modern AI software engineering solutions.
RepoGraph substantially boosts the performance of all systems, leading to a new state-of-the-art among open-source frameworks.
arXiv Detail & Related papers (2024-10-03T05:45:26Z) - An Overview and Catalogue of Dependency Challenges in Open Source Software Package Registries [52.23798016734889]
This article provides a catalogue of dependency-related challenges that come with relying on OSS packages or libraries.
The catalogue is based on the scientific literature on empirical research that has been conducted to understand, quantify and overcome these challenges.
arXiv Detail & Related papers (2024-09-27T16:20:20Z) - An Empirical Study on Package-Level Deprecation in Python Ecosystem [6.0347124337922144]
Python, a widely adopted programming language, is renowned for its extensive and diverse third-party package ecosystem.
A significant number of OSS packages within the Python ecosystem are in poor maintenance, leading to potential risks in functionality and security.
This paper investigates the current practices of announcing, receiving, and handling package-level deprecation in the Python ecosystem.
arXiv Detail & Related papers (2024-08-19T18:08:21Z) - Contributing Back to the Ecosystem: A User Survey of NPM Developers [10.154686574810501]
Survey of 49 developers from the NPM ecosystem.
We find that developers are more likely to maintain their own packages rather than contribute to the ecosystem.
Our results open up new avenues into tool support and research into how to sustain these ecosystems.
arXiv Detail & Related papers (2024-07-01T00:15:55Z) - How to Understand Whole Software Repository? [64.19431011897515]
An excellent understanding of the whole repository will be the critical path to Automatic Software Engineering (ASE)
We develop a novel method named RepoUnderstander by guiding agents to comprehensively understand the whole repositories.
To better utilize the repository-level knowledge, we guide the agents to summarize, analyze, and plan.
arXiv Detail & Related papers (2024-06-03T15:20:06Z) - Biomedical Open Source Software: Crucial Packages and Hidden Heroes [2.3960586265742574]
We map the dependencies of the software used in biomedical papers and find the packages critical to the software ecosystems.
We propose the centrality metrics for the network of software dependencies, analyze three ecosystems (PyPi, CRAN, Bioconductor) and determine the packages with the highest centrality.
arXiv Detail & Related papers (2024-04-10T01:22:02Z) - An Introduction to Software Ecosystems [7.574742446357262]
This chapter defines and presents different kinds of software ecosystems.
The focus is on the development, tooling and analytics aspects of software ecosystems.
The chapter also introduces and clarifies the relevant terms needed to understand and analyse these ecosystems.
arXiv Detail & Related papers (2023-07-28T17:58:59Z) - The Life and Death of Software Ecosystems [5.043784941542819]
We explore two aspects that contribute to a healthy ecosystem, related to the attraction (and detraction) and the death of ecosystems.
To function and survive, ecosystems need to attract people, get them on-boarded and retain them.
arXiv Detail & Related papers (2023-05-28T23:43:19Z) - The GitHub Development Workflow Automation Ecosystems [47.818229204130596]
Large-scale software development has become a highly collaborative endeavour.
This chapter explores the ecosystems of development bots and GitHub Actions.
It provides an extensive survey of the state-of-the-art in this domain.
arXiv Detail & Related papers (2023-05-08T15:24:23Z) - Ecosystem Graphs: The Social Footprint of Foundation Models [64.02855828418608]
We propose Ecosystem Graphs as a documentation framework to transparently centralize knowledge of this ecosystem.
Ecosystem Graphs is composed of assets (datasets, models, applications) linked together by dependencies that indicate technical (e.g. how Bing relies on GPT-4) and social (e.g. how Microsoft relies on OpenAI) relationships.
arXiv Detail & Related papers (2023-03-28T07:18:29Z) - Data Science for Engineers: A Teaching Ecosystem [59.00739310930656]
We describe an ecosystem for teaching data science to engineers at the Faculty of Physical and Mathematical Sciences, Universidad de Chile.
This initiative has been motivated by the increasing demand for DS qualifications both from academic and professional environments.
By sharing our teaching principles and the innovative components of our approach to teaching DS, we hope our experience can be useful to those developing their own DS programmes and ecosystems.
arXiv Detail & Related papers (2021-01-14T14:17:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.