Individual context-free online community health indicators fail to identify open source software sustainability
- URL: http://arxiv.org/abs/2309.12120v3
- Date: Thu, 9 May 2024 14:34:08 GMT
- Title: Individual context-free online community health indicators fail to identify open source software sustainability
- Authors: Yo Yehudi, Carole Goble, Caroline Jay,
- Abstract summary: We monitored thirty-eight open source projects over the period of a year.
None of the projects were abandoned during this period, and only one project entered a planned shutdown.
Results were highly heterogeneous, showing little commonality across documentation, mean response times for issues and code contributions, and available funding/staffing resources.
- Score: 3.192308005611312
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The global value of open source software is estimated to be in the billions or trillions worldwide1, but despite this, it is often under-resourced and subject to high-impact security vulnerabilities and stability failures2,3. In order to investigate factors contributing to open source community longevity, we monitored thirty-eight open source projects over the period of a year, focusing primarily, but not exclusively, on open science-related online code-oriented communities. We measured performance indicators, using both subjective and qualitative measures (participant surveys), as well as using computational scripts to retrieve and analyse indicators associated with these projects' online source control codebases. None of the projects were abandoned during this period, and only one project entered a planned shutdown. Project ages spanned from under one year to over forty years old at the start of the study, and results were highly heterogeneous, showing little commonality across documentation, mean response times for issues and code contributions, and available funding/staffing resources. Whilst source code-based indicators were able to offer some insights into project activity, we observed that similar indicators across different projects often had very different meanings when context was taken into account. We conclude that the individual context-free metrics we studied were not sufficient or essential for project longevity and sustainability, and might even become detrimental if used to support high-stakes decision making. When attempting to understand an online open community's longer-term sustainability, we recommend that researchers avoid cross-project quantitative comparisons, and advise instead that they use single-project-level assessments which combine quantitative measures with contextualising qualitative data.
Related papers
- Leveraging Large Language Models for Efficient Failure Analysis in Game Development [47.618236610219554]
This paper proposes a new approach to automatically identify which change in the code caused a test to fail.
The method leverages Large Language Models (LLMs) to associate error messages with the corresponding code changes causing the failure.
Our approach reaches an accuracy of 71% in our newly created dataset, which comprises issues reported by developers at EA over a period of one year.
arXiv Detail & Related papers (2024-06-11T09:21:50Z) - Free Open Source Communities Sustainability: Does It Make a Difference
in Software Quality? [2.981092370528753]
This study aims to empirically explore how the different aspects of sustainability impact software quality.
16 sustainability metrics across four categories were sampled and applied to a set of 217 OSS projects.
arXiv Detail & Related papers (2024-02-10T09:37:44Z) - Guiding Effort Allocation in Open-Source Software Projects Using Bus
Factor Analysis [1.0878040851638]
Bus Factor (BF) of a project defined as 'the number of key developers who would need to be incapacitated to make a project unable to proceed'
We propose using other metrics like lines of code changes (LOCC) and cosine difference of lines of code (change-size-cos) to calculate the BF.
arXiv Detail & Related papers (2024-01-06T20:55:40Z) - Towards Learning Geometric Eigen-Lengths Crucial for Fitting Tasks [62.89746245940464]
Low-dimensional yet crucial geometric eigen-lengths often determine the success of some geometric tasks.
Humans have materialized such crucial geometric eigen-lengths in common sense.
It remains obscure and underexplored if learning systems can be equipped with similar capabilities.
arXiv Detail & Related papers (2023-12-25T04:41:52Z) - Code Ownership in Open-Source AI Software Security [18.779538756226298]
We use code ownership metrics to investigate the correlation with latent vulnerabilities across five prominent open-source AI software projects.
The findings suggest a positive relationship between high-level ownership (characterised by a limited number of minor contributors) and a decrease in vulnerabilities.
With these novel code ownership metrics, we have implemented a Python-based command-line application to aid project curators and quality assurance professionals in evaluating and benchmarking their on-site projects.
arXiv Detail & Related papers (2023-12-18T00:37:29Z) - Empowering Many, Biasing a Few: Generalist Credit Scoring through Large
Language Models [53.620827459684094]
Large Language Models (LLMs) have great potential for credit scoring tasks, with strong generalization ability across multiple tasks.
We propose the first open-source comprehensive framework for exploring LLMs for credit scoring.
We then propose the first Credit and Risk Assessment Large Language Model (CALM) by instruction tuning, tailored to the nuanced demands of various financial risk assessment tasks.
arXiv Detail & Related papers (2023-10-01T03:50:34Z) - How Early Participation Determines Long-Term Sustained Activity in
GitHub Projects? [20.236570418427533]
We aim to explore the relationship between early participation factors and long-term project sustainability.
We leverage a novel methodology combining the Blumberg model of performance and machine learning to predict the sustainability of 290,255 GitHub projects.
We quantitatively show that early participants have a positive effect on project's future sustained activity if they have prior experience in OSS project incubation.
arXiv Detail & Related papers (2023-08-11T08:24:41Z) - L-Eval: Instituting Standardized Evaluation for Long Context Language
Models [91.05820785008527]
We propose L-Eval to institute a more standardized evaluation for long context language models (LCLMs)
We build a new evaluation suite containing 20 sub-tasks, 508 long documents, and over 2,000 human-labeled query-response pairs.
Results show that popular n-gram matching metrics generally can not correlate well with human judgment.
arXiv Detail & Related papers (2023-07-20T17:59:41Z) - Towards a Critical Open-Source Software Database [0.0]
CrOSSD project aims to build a database of OSS projects and measure their current project "health" status.
quantitative metrics will be gathered through automated crawling of meta information such as the number of contributors, commits and lines of code.
qualitative metrics will be gathered for selected "critical" projects through manual analysis and automated tools.
arXiv Detail & Related papers (2023-05-02T10:43:21Z) - Dimensions of Commonsense Knowledge [60.49243784752026]
We survey a wide range of popular commonsense sources with a special focus on their relations.
We consolidate these relations into 13 knowledge dimensions, each abstracting over more specific relations found in sources.
arXiv Detail & Related papers (2021-01-12T17:52:39Z) - CNN-based Density Estimation and Crowd Counting: A Survey [65.06491415951193]
This paper comprehensively studies the crowd counting models, mainly CNN-based density map estimation methods.
According to the evaluation metrics, we select the top three performers on their crowd counting datasets.
We expect to make reasonable inference and prediction for the future development of crowd counting.
arXiv Detail & Related papers (2020-03-28T13:17:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.