The Dynamics of Innovation in Open Source Software Ecosystems
- URL: http://arxiv.org/abs/2411.14894v1
- Date: Fri, 22 Nov 2024 12:31:25 GMT
- Title: The Dynamics of Innovation in Open Source Software Ecosystems
- Authors: Gábor Mészáros, Johannes Wachs,
- Abstract summary: New libraries emerge at a remarkably predictable sub-linear rate within ecosystems per post.
The most widely used libraries are used many times more often than the average.
New users are more likely to use new libraries and new combinations.
- Score: 0.8594140167290099
- License:
- Abstract: Software libraries are the elementary building blocks of open source software ecosystems, extending the capabilities of programming languages beyond their standard libraries. Although ecosystem health is often quantified using data on libraries and their interdependencies, we know little about the rate at which new libraries are developed and used. Here we study imports of libraries in 12 different programming language ecosystems within millions of Stack Overflow posts over a 15 year period. New libraries emerge at a remarkably predictable sub-linear rate within ecosystems per post. As a consequence, the distribution of the frequency of use of libraries in all ecosystems is highly concentrated: the most widely used libraries are used many times more often than the average. Although new libraries come out more slowly over time, novel combinations of libraries appear at an approximately linear rate, suggesting that recombination is a key innovation process in software. Newer users are more likely to use new libraries and new combinations, and we find significant variation in the rates of innovation between countries. Our work links the evolution of OSS ecosystems to the literature on the dynamics of innovation, revealing how ecosystems grow and highlighting implications for sustainability.
Related papers
- Contributing Back to the Ecosystem: A User Survey of NPM Developers [10.154686574810501]
Survey of 49 developers from the NPM ecosystem.
We find that developers are more likely to maintain their own packages rather than contribute to the ecosystem.
Our results open up new avenues into tool support and research into how to sustain these ecosystems.
arXiv Detail & Related papers (2024-07-01T00:15:55Z) - A Preliminary Study on Self-Contained Libraries in the NPM Ecosystem [2.221643499902673]
The widespread of libraries within modern software ecosystems creates complex networks of dependencies.
One mitigation strategy involves reducing dependencies; libraries with zero dependencies become to self-contained.
This paper explores the characteristics of self-contained libraries within the NPM ecosystem.
arXiv Detail & Related papers (2024-06-17T09:33:49Z) - Ecosystem of Large Language Models for Code [7.7454423388704745]
This paper introduces a pioneering analysis of the code model ecosystem.
We first identify the popular and influential datasets, models, and contributors.
The top 3 most popular reuse types are fine-tuning, architecture sharing, and quantization.
arXiv Detail & Related papers (2024-05-27T01:31:30Z) - LILO: Learning Interpretable Libraries by Compressing and Documenting Code [71.55208585024198]
We introduce LILO, a neurosymbolic framework that iteratively synthesizes, compresses, and documents code.
LILO combines LLM-guided program synthesis with recent algorithmic advances in automated from Stitch.
We find that AutoDoc boosts performance by helping LILO's synthesizer to interpret and deploy learned abstractions.
arXiv Detail & Related papers (2023-10-30T17:55:02Z) - Promises and Perils of Mining Software Package Ecosystem Data [10.787686237395816]
Third-party packages have led to the emergence of large software package ecosystems with a maze of inter-dependencies.
Understanding the infrastructure and dynamics of package ecosystems has given rise to approaches for better code reuse, automated updates, and the avoidance of vulnerabilities.
In this chapter, we review promises and perils of mining the rich data related to software package ecosystems available to software engineering researchers.
arXiv Detail & Related papers (2023-05-29T03:09:48Z) - SequeL: A Continual Learning Library in PyTorch and JAX [50.33956216274694]
SequeL is a library for Continual Learning that supports both PyTorch and JAX frameworks.
It provides a unified interface for a wide range of Continual Learning algorithms, including regularization-based approaches, replay-based approaches, and hybrid approaches.
We release SequeL as an open-source library, enabling researchers and developers to easily experiment and extend the library for their own purposes.
arXiv Detail & Related papers (2023-04-21T10:00:22Z) - Deep learning for table detection and structure recognition: A survey [49.09628624903334]
The goal of this survey is to provide a profound comprehension of the major developments in the field of Table Detection.
We provide an analysis of both classic and new applications in the field.
The datasets and source code of the existing models are organized to provide the reader with a compass on this vast literature.
arXiv Detail & Related papers (2022-11-15T19:42:27Z) - Code Librarian: A Software Package Recommendation System [65.05559087332347]
We present a recommendation engine called Librarian for open source libraries.
A candidate library package is recommended for a given context if: 1) it has been frequently used with the imported libraries in the program; 2) it has similar functionality to the imported libraries in the program; 3) it has similar functionality to the developer's implementation, and 4) it can be used efficiently in the context of the provided code.
arXiv Detail & Related papers (2022-10-11T12:30:05Z) - Bridging the Language Gap: An Empirical Study of Bindings for Open Source Machine Learning Libraries Across Software Package Ecosystems [9.339419442638983]
Machine learning libraries enable developers to integrate advanced ML functionality into their own applications.
However, popular ML libraries are not available in all programming languages and software package ecosystems.
We collect 2,436 cross-ecosystem bindings for 546 ML libraries across 13 software package ecosystems.
arXiv Detail & Related papers (2022-01-18T18:53:21Z) - Datasets: A Community Library for Natural Language Processing [55.48866401721244]
datasets is a community library for contemporary NLP.
The library includes more than 650 unique datasets, has more than 250 contributors, and has helped support a variety of novel cross-dataset research projects.
arXiv Detail & Related papers (2021-09-07T03:59:22Z) - Bayesian active learning for production, a systematic study and a
reusable library [85.32971950095742]
In this paper, we analyse the main drawbacks of current active learning techniques.
We do a systematic study on the effects of the most common issues of real-world datasets on the deep active learning process.
We derive two techniques that can speed up the active learning loop such as partial uncertainty sampling and larger query size.
arXiv Detail & Related papers (2020-06-17T14:51:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.