Measuring Software Innovation with Open Source Software Development Data
- URL: http://arxiv.org/abs/2411.05087v1
- Date: Thu, 07 Nov 2024 19:11:32 GMT
- Title: Measuring Software Innovation with Open Source Software Development Data
- Authors: Eva Maxfield Brown, Cailean Osborne, Peter Cihon, Moritz Böhmecke-Schwafert, Kevin Xu, Mirko Boehm, Knut Blind,
- Abstract summary: This paper introduces a novel measure of software innovation based on open source software (OSS) development activity on GitHub.
We examine the dependency growth and release complexity among $sim$200,000 unique releases from 28,000 unique packages over two years post-release.
We conclude that major releases of OSS packages count as a unit of innovation complementary to scientific publications, patents, and standards.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper introduces a novel measure of software innovation based on open source software (OSS) development activity on GitHub. We examine the dependency growth and release complexity among $\sim$200,000 unique releases from 28,000 unique packages across the JavaScript, Python, and Ruby ecosystems over two years post-release. We find that major versions show differential, strong prediction of one-year lagged log change in dependencies. In addition, semantic versioning of OSS releases is correlated with their complexity and predict downstream adoption. We conclude that major releases of OSS packages count as a unit of innovation complementary to scientific publications, patents, and standards, offering applications for policymakers, managers, and researchers.
Related papers
- The Evaluation of Open Source Software Innovativeness [0.0]
It suggests an innovation typology supported by the notion of functional added value.<n>By showing the shortcomings of widely used innovation metrics, this research supports a new approach of innovativeness assessment specialized in each sector.
arXiv Detail & Related papers (2025-05-06T07:53:11Z) - Edge-Cloud Collaborative Computing on Distributed Intelligence and Model Optimization: A Survey [59.52058740470727]
Edge-cloud collaborative computing (ECCC) has emerged as a pivotal paradigm for addressing the computational demands of modern intelligent applications.<n>Recent advancements in AI, particularly deep learning and large language models (LLMs), have dramatically enhanced the capabilities of these distributed systems.<n>This survey provides a structured tutorial on fundamental architectures, enabling technologies, and emerging applications.
arXiv Detail & Related papers (2025-05-03T13:55:38Z) - Open Source Software Lifecycle Classification: Developing Wrangling Techniques for Complex Sociotechnical Systems [0.0]
This paper reviews previous attempts to classify open source software and other organizational ecosystems.<n>It examines the divergent and sometimes conflicting purposes that may exist for classifying open source projects and how these competing interests impede our progress in developing a comprehensive understanding of how open source software projects and companies operate.
arXiv Detail & Related papers (2025-04-23T12:37:53Z) - Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models.
Our framework incorporates two complementary strategies: internal TTC and external TTC.
We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z) - Open Source at a Crossroads: The Future of Licensing Driven by Monetization [11.149764135999437]
Open Source Software Licenses (OSS licenses) ensure that software can be sold or distributed as part of aggregate programs from various sources without requiring a royalty or fee.
We argue that open source is at a crossroads, with a growing need to redefine its licensing models and support communities and critical software.
arXiv Detail & Related papers (2025-03-04T17:44:01Z) - Towards a Classification of Open-Source ML Models and Datasets for Software Engineering [52.257764273141184]
Open-source Pre-Trained Models (PTMs) and datasets provide extensive resources for various Machine Learning (ML) tasks.
These resources lack a classification tailored to Software Engineering (SE) needs.
We apply an SE-oriented classification to PTMs and datasets on a popular open-source ML repository, Hugging Face (HF), and analyze the evolution of PTMs over time.
arXiv Detail & Related papers (2024-11-14T18:52:05Z) - Toward a Cohesive AI and Simulation Software Ecosystem for Scientific Innovation [2.0580344655030554]
We discuss the need for an integrated software stack that unites artificial intelligence (AI) and modeling and simulation (ModSim) tools to advance scientific discovery.
Key challenges highlighted include balancing the distinct needs of AI and ModSim, especially in terms of software build practices, dependency management, and compatibility.
arXiv Detail & Related papers (2024-11-14T15:17:50Z) - Lingma SWE-GPT: An Open Development-Process-Centric Language Model for Automated Software Improvement [62.94719119451089]
Lingma SWE-GPT series learns from and simulating real-world code submission activities.
Lingma SWE-GPT 72B resolves 30.20% of GitHub issues, marking a significant improvement in automatic issue resolution.
arXiv Detail & Related papers (2024-11-01T14:27:16Z) - Characterising Open Source Co-opetition in Company-hosted Open Source Software Projects: The Cases of PyTorch, TensorFlow, and Transformers [5.2337753974570616]
Companies, including market rivals, have long collaborated on the development of open source software (OSS)
"Open source co-opetition" results in a tangle of co-operation and competition known as "open source co-opetition"
arXiv Detail & Related papers (2024-10-23T19:35:41Z) - An Overview and Catalogue of Dependency Challenges in Open Source Software Package Registries [52.23798016734889]
This article provides a catalogue of dependency-related challenges that come with relying on OSS packages or libraries.
The catalogue is based on the scientific literature on empirical research that has been conducted to understand, quantify and overcome these challenges.
arXiv Detail & Related papers (2024-09-27T16:20:20Z) - The Impact of Large Language Models on Open-source Innovation: Evidence from GitHub Copilot [0.0]
Large Language Models (LLMs) have been shown to enhance individual productivity in guided settings.<n>This paper explores whether LLMs affect two aspects of collaborative work: capability innovation and iterative innovation.<n>We focus on open-source projects on GitHub by leveraging a natural experiment around the selective rollout of GitHub Copilot.<n>We observe a significant jump in overall contributions, suggesting that LLMs effectively augment collaborative innovation in an unguided setting.
arXiv Detail & Related papers (2024-09-12T19:59:54Z) - PVAC: Package Version Activity Categorizer, Leveraging Semantic Versioning in a Heterogeneous System [0.0]
This research aims to introduce a systematic method and a prototype tool for assessing version activity within heterogeneous package manager ecosystems.<n>We developed a Package Version Activity Categorizer (PVAC) that consists of three components.<n>PVAC parses semantic versioning details from diverse package version strings, enabling consistent categorization and quantitative scoring of version changes.
arXiv Detail & Related papers (2024-09-06T19:58:20Z) - Contemporary Software Modernization: Perspectives and Challenges to Deal with Legacy Systems [48.33168695898682]
"Software modernization" emerged as a research topic in the early 2000s.
Despite the large amount of work available in the literature, there are significant limitations.
arXiv Detail & Related papers (2024-07-04T15:49:52Z) - VersiCode: Towards Version-controllable Code Generation [58.82709231906735]
Large Language Models (LLMs) have made tremendous strides in code generation, but existing research fails to account for the dynamic nature of software development.
We propose two novel tasks aimed at bridging this gap: version-specific code completion (VSCC) and version-aware code migration (VACM)
We conduct an extensive evaluation on VersiCode, which reveals that version-controllable code generation is indeed a significant challenge.
arXiv Detail & Related papers (2024-06-11T16:15:06Z) - Towards a Structural Equation Model of Open Source Blockchain Software
Health [0.0]
This work uses exploratory factor analysis to identify latent constructs that are representative of general public interest or popularity in software.
We find that interest is a combination of stars, forks, and text mentions in the GitHub repository, while a second factor for robustness is composed of a criticality score.
A structural model of software health is proposed such that general interest positively influences developer engagement, which, in turn, positively predicts software robustness.
arXiv Detail & Related papers (2023-10-31T08:47:41Z) - The Software Heritage Open Science Ecosystem [0.0]
Software Heritage is the largest public archive of software source code and associated development history.
It has archived more than 16 billion unique source code files coming from more than 250 million collaborative development projects.
It supports empirical research on software by materializing in a single Merkle direct acyclic graph the development history of public code.
It ensures availability and guarantees integrity of the source code of software artifacts used in any field that relies on software to conduct experiments.
arXiv Detail & Related papers (2023-10-16T11:32:03Z) - SoTaNa: The Open-Source Software Development Assistant [81.86136560157266]
SoTaNa is an open-source software development assistant.
It generates high-quality instruction-based data for the domain of software engineering.
It employs a parameter-efficient fine-tuning approach to enhance the open-source foundation model, LLaMA.
arXiv Detail & Related papers (2023-08-25T14:56:21Z) - The GitHub Development Workflow Automation Ecosystems [47.818229204130596]
Large-scale software development has become a highly collaborative endeavour.
This chapter explores the ecosystems of development bots and GitHub Actions.
It provides an extensive survey of the state-of-the-art in this domain.
arXiv Detail & Related papers (2023-05-08T15:24:23Z) - Underproduction: An Approach for Measuring Risk in Open Source Software [9.701036831490766]
'Underproduction' occurs when the supply of software engineering labor becomes out of alignment with the demand of people who rely on the software produced.
We present a conceptual framework for identifying relative underproduction in software as well as a statistical method for applying our framework to a comprehensive dataset.
arXiv Detail & Related papers (2021-02-27T23:18:21Z) - An Empirical Analysis of the R Package Ecosystem [0.0]
We analyze more than 25,000 packages, 150,000 releases, and 15 million files across two decades.
We find that the historical growth of the ecosystem has been robust under all measures.
arXiv Detail & Related papers (2021-02-19T12:55:18Z) - Benchmarking Graph Neural Networks [75.42159546060509]
Graph neural networks (GNNs) have become the standard toolkit for analyzing and learning from data on graphs.
For any successful field to become mainstream and reliable, benchmarks must be developed to quantify progress.
GitHub repository has reached 1,800 stars and 339 forks, which demonstrates the utility of the proposed open-source framework.
arXiv Detail & Related papers (2020-03-02T15:58:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.