How Early Participation Determines Long-Term Sustained Activity in
GitHub Projects?
- URL: http://arxiv.org/abs/2308.06005v4
- Date: Thu, 28 Sep 2023 15:39:47 GMT
- Title: How Early Participation Determines Long-Term Sustained Activity in
GitHub Projects?
- Authors: Wenxin Xiao, Hao He, Weiwei Xu, Yuxia Zhang, and Minghui Zhou
- Abstract summary: We aim to explore the relationship between early participation factors and long-term project sustainability.
We leverage a novel methodology combining the Blumberg model of performance and machine learning to predict the sustainability of 290,255 GitHub projects.
We quantitatively show that early participants have a positive effect on project's future sustained activity if they have prior experience in OSS project incubation.
- Score: 20.236570418427533
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Although the open source model bears many advantages in software development,
open source projects are always hard to sustain. Previous research on open
source sustainability mainly focuses on projects that have already reached a
certain level of maturity (e.g., with communities, releases, and downstream
projects). However, limited attention is paid to the development of
(sustainable) open source projects in their infancy, and we believe an
understanding of early sustainability determinants is crucial for project
initiators, incubators, newcomers, and users.
In this paper, we aim to explore the relationship between early participation
factors and long-term project sustainability. We leverage a novel methodology
combining the Blumberg model of performance and machine learning to predict the
sustainability of 290,255 GitHub projects. Specificially, we train an XGBoost
model based on early participation (first three months of activity) in 290,255
GitHub projects and we interpret the model using LIME. We quantitatively show
that early participants have a positive effect on project's future sustained
activity if they have prior experience in OSS project incubation and
demonstrate concentrated focus and steady commitment. Participation from
non-code contributors and detailed contribution documentation also promote
project's sustained activity. Compared with individual projects, building a
community that consists of more experienced core developers and more active
peripheral developers is important for organizational projects. This study
provides unique insights into the incubation and recognition of sustainable
open source projects, and our interpretable prediction approach can also offer
guidance to open source project initiators and newcomers.
Related papers
- CROSS: A Contributor-Project Interaction Lifecycle Model for Open Source Software [2.9631016562930546]
Cross model is a novel contributor-project interaction lifecycle model for open source software.
It explains a range of archetypal cases of contributor engagement and highlights research gaps, especially in EoS/offboarding scenarios.
arXiv Detail & Related papers (2024-09-12T17:57:12Z) - DevEval: Evaluating Code Generation in Practical Software Projects [52.16841274646796]
We propose a new benchmark named DevEval, aligned with Developers' experiences in practical projects.
DevEval is collected through a rigorous pipeline, containing 2,690 samples from 119 practical projects.
We assess five popular LLMs on DevEval and reveal their actual abilities in code generation.
arXiv Detail & Related papers (2024-01-12T06:51:30Z) - Guiding Effort Allocation in Open-Source Software Projects Using Bus
Factor Analysis [1.0878040851638]
Bus Factor (BF) of a project defined as 'the number of key developers who would need to be incapacitated to make a project unable to proceed'
We propose using other metrics like lines of code changes (LOCC) and cosine difference of lines of code (change-size-cos) to calculate the BF.
arXiv Detail & Related papers (2024-01-06T20:55:40Z) - Unveiling Diversity: Empowering OSS Project Leaders with Community
Diversity and Turnover Dashboards [51.67585198094836]
CommunityTapestry is a dynamic real-time community dashboard.
It presents key diversity and turnover signals that we identified from the literature.
It helped project leaders identify areas of improvement and gave them actionable information.
arXiv Detail & Related papers (2023-12-13T22:12:57Z) - Individual context-free online community health indicators fail to identify open source software sustainability [3.192308005611312]
We monitored thirty-eight open source projects over the period of a year.
None of the projects were abandoned during this period, and only one project entered a planned shutdown.
Results were highly heterogeneous, showing little commonality across documentation, mean response times for issues and code contributions, and available funding/staffing resources.
arXiv Detail & Related papers (2023-09-21T14:41:41Z) - Using Hashtags to Analyze Purpose and Technology Application of
Open-Source Project Related to COVID-19 [5.89408513477919]
This study examines trends in projects with different functionalities and the relationship between functionalities and technologies.
The study results show an imbalance in the number of projects with varying functionalities in the GitHub community.
The spontaneous behavior of developers may lack organization and make it challenging to target needs.
arXiv Detail & Related papers (2022-07-03T02:37:31Z) - Attracting and Retaining OSS Contributors with a Maintainer Dashboard [19.885747206499712]
We design a maintainer dashboard that provides recommendations on how to attract and retain open source contributors.
We conduct a project-specific evaluation with maintainers to better understand use cases in which this tool will be most helpful.
We distill our findings to share what the future of recommendations in open source looks like and how to make these recommendations most meaningful over time.
arXiv Detail & Related papers (2022-02-15T21:39:37Z) - YMIR: A Rapid Data-centric Development Platform for Vision Applications [82.67319997259622]
This paper introduces an open source platform for rapid development of computer vision applications.
The platform puts the efficient data development at the center of the machine learning development process.
arXiv Detail & Related papers (2021-11-19T05:02:55Z) - Estimating Fund-Raising Performance for Start-up Projects from a Market
Graph Perspective [58.353799280109904]
We propose a Graph-based Market Environment (GME) model for predicting the fund-raising performance of the unpublished project by exploiting the market environment.
Specifically, we propose a Graph-based Market Environment (GME) model for predicting the fund-raising performance of the unpublished project by exploiting the market environment.
arXiv Detail & Related papers (2021-05-27T02:39:30Z) - Towards Utility-based Prioritization of Requirements in Open Source
Environments [51.65930505153647]
We show how utility-based prioritization approaches can be used to support contributors in conventional and open source Requirements Engineering scenarios.
As an example, we show how dependencies can be taken into account in utility-based prioritization processes.
arXiv Detail & Related papers (2021-02-17T09:05:54Z) - Knowledge Integration of Collaborative Product Design Using Cloud
Computing Infrastructure [65.2157099438235]
The main focus of this paper is the concept of ongoing research in providing the knowledge integration service for collaborative product design and development using cloud computing infrastructure.
Proposed knowledge integration services support users by giving real-time access to knowledge resources.
arXiv Detail & Related papers (2020-01-16T18:44:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.