Promises and Perils of Mining Software Package Ecosystem Data
- URL: http://arxiv.org/abs/2306.10021v1
- Date: Mon, 29 May 2023 03:09:48 GMT
- Title: Promises and Perils of Mining Software Package Ecosystem Data
- Authors: Raula Gaikovina Kula, Katsuro Inoue, and Christoph Treude
- Abstract summary: Third-party packages have led to the emergence of large software package ecosystems with a maze of inter-dependencies.
Understanding the infrastructure and dynamics of package ecosystems has given rise to approaches for better code reuse, automated updates, and the avoidance of vulnerabilities.
In this chapter, we review promises and perils of mining the rich data related to software package ecosystems available to software engineering researchers.
- Score: 10.787686237395816
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The use of third-party packages is becoming increasingly popular and has led
to the emergence of large software package ecosystems with a maze of
inter-dependencies. Since the reliance on these ecosystems enables developers
to reduce development effort and increase productivity, it has attracted the
interest of researchers: understanding the infrastructure and dynamics of
package ecosystems has given rise to approaches for better code reuse,
automated updates, and the avoidance of vulnerabilities, to name a few
examples. But the reality of these ecosystems also poses challenges to software
engineering researchers, such as: How do we obtain the complete network of
dependencies along with the corresponding versioning information? What are the
boundaries of these package ecosystems? How do we consistently detect
dependencies that are declared but not used? How do we consistently identify
developers within a package ecosystem? How much of the ecosystem do we need to
understand to analyse a single component? How well do our approaches generalise
across different programming languages and package ecosystems? In this chapter,
we review promises and perils of mining the rich data related to software
package ecosystems available to software engineering researchers.
Related papers
- Tracking Down Software Cluster Bombs: A Current State Analysis of the Free/Libre and Open Source Software (FLOSS) Ecosystem [0.43981305860983705]
This study provides a summary of the current state of available FLOSS package repositories.
It addresses the challenge of identifying problematic areas within a software ecosystem.
The results indicate that while there are well-maintained projects within the FLOSS ecosystem, there are also high-impact projects that are susceptible to supply chain attacks.
arXiv Detail & Related papers (2025-02-12T08:57:57Z) - A Machine Learning-Based Approach For Detecting Malicious PyPI Packages [4.311626046942916]
In modern software development, the use of external libraries and packages is increasingly prevalent.
This reliance on reusing code introduces serious risks for deployed software in the form of malicious packages.
We propose a data-driven approach that uses machine learning and static analysis to examine the package's metadata, code, files, and textual characteristics.
arXiv Detail & Related papers (2024-12-06T18:49:06Z) - RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph [63.87660059104077]
We present RepoGraph, a plug-in module that manages a repository-level structure for modern AI software engineering solutions.
RepoGraph substantially boosts the performance of all systems, leading to a new state-of-the-art among open-source frameworks.
arXiv Detail & Related papers (2024-10-03T05:45:26Z) - An Overview and Catalogue of Dependency Challenges in Open Source Software Package Registries [52.23798016734889]
This article provides a catalogue of dependency-related challenges that come with relying on OSS packages or libraries.
The catalogue is based on the scientific literature on empirical research that has been conducted to understand, quantify and overcome these challenges.
arXiv Detail & Related papers (2024-09-27T16:20:20Z) - Contributing Back to the Ecosystem: A User Survey of NPM Developers [10.154686574810501]
Survey of 49 developers from the NPM ecosystem.
We find that developers are more likely to maintain their own packages rather than contribute to the ecosystem.
Our results open up new avenues into tool support and research into how to sustain these ecosystems.
arXiv Detail & Related papers (2024-07-01T00:15:55Z) - How to Understand Whole Software Repository? [64.19431011897515]
An excellent understanding of the whole repository will be the critical path to Automatic Software Engineering (ASE)
We develop a novel method named RepoUnderstander by guiding agents to comprehensively understand the whole repositories.
To better utilize the repository-level knowledge, we guide the agents to summarize, analyze, and plan.
arXiv Detail & Related papers (2024-06-03T15:20:06Z) - Biomedical Open Source Software: Crucial Packages and Hidden Heroes [2.3960586265742574]
We map the dependencies of the software used in biomedical papers and find the packages critical to the software ecosystems.
We propose the centrality metrics for the network of software dependencies, analyze three ecosystems (PyPi, CRAN, Bioconductor) and determine the packages with the highest centrality.
arXiv Detail & Related papers (2024-04-10T01:22:02Z) - An Introduction to Software Ecosystems [7.574742446357262]
This chapter defines and presents different kinds of software ecosystems.
The focus is on the development, tooling and analytics aspects of software ecosystems.
The chapter also introduces and clarifies the relevant terms needed to understand and analyse these ecosystems.
arXiv Detail & Related papers (2023-07-28T17:58:59Z) - The GitHub Development Workflow Automation Ecosystems [47.818229204130596]
Large-scale software development has become a highly collaborative endeavour.
This chapter explores the ecosystems of development bots and GitHub Actions.
It provides an extensive survey of the state-of-the-art in this domain.
arXiv Detail & Related papers (2023-05-08T15:24:23Z) - Ecosystem Graphs: The Social Footprint of Foundation Models [64.02855828418608]
We propose Ecosystem Graphs as a documentation framework to transparently centralize knowledge of this ecosystem.
Ecosystem Graphs is composed of assets (datasets, models, applications) linked together by dependencies that indicate technical (e.g. how Bing relies on GPT-4) and social (e.g. how Microsoft relies on OpenAI) relationships.
arXiv Detail & Related papers (2023-03-28T07:18:29Z) - Data Science for Engineers: A Teaching Ecosystem [59.00739310930656]
We describe an ecosystem for teaching data science to engineers at the Faculty of Physical and Mathematical Sciences, Universidad de Chile.
This initiative has been motivated by the increasing demand for DS qualifications both from academic and professional environments.
By sharing our teaching principles and the innovative components of our approach to teaching DS, we hope our experience can be useful to those developing their own DS programmes and ecosystems.
arXiv Detail & Related papers (2021-01-14T14:17:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.