Guiding Effort Allocation in Open-Source Software Projects Using Bus
Factor Analysis
- URL: http://arxiv.org/abs/2401.03303v1
- Date: Sat, 6 Jan 2024 20:55:40 GMT
- Title: Guiding Effort Allocation in Open-Source Software Projects Using Bus
Factor Analysis
- Authors: Aliza Lisan, Boyana Norris
- Abstract summary: Bus Factor (BF) of a project defined as 'the number of key developers who would need to be incapacitated to make a project unable to proceed'
We propose using other metrics like lines of code changes (LOCC) and cosine difference of lines of code (change-size-cos) to calculate the BF.
- Score: 1.0878040851638
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A critical issue faced by open-source software projects is the risk of key
personnel leaving the project. This risk is exacerbated in large projects that
have been under development for a long time and experienced growth in their
development teams. One way to quantify this risk is to measure the
concentration of knowledge about the project among its developers. Formally
known as the Bus Factor (BF) of a project and defined as 'the number of key
developers who would need to be incapacitated to make a project unable to
proceed'. Most of the proposed algorithms for BF calculation measure a
developer's knowledge of a file based on the number of commits. In this work,
we propose using other metrics like lines of code changes (LOCC) and cosine
difference of lines of code (change-size-cos) to calculate the BF. We use these
metrics for BF calculation for five open-source GitHub projects using the CST
algorithm and the RIG algorithm, which is git-blame-based. Moreover, we
calculate the BF on project sub-directories that have seen the most active
development recently. Lastly, we compare the results of the two algorithms in
accuracy, similarity in results, execution time, and trends in BF values over
time.
Related papers
- Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion? [60.84912551069379]
We present the Code-Development Benchmark (Codev-Bench), a fine-grained, real-world, repository-level, and developer-centric evaluation framework.
Codev-Agent is an agent-based system that automates repository crawling, constructs execution environments, extracts dynamic calling chains from existing unit tests, and generates new test samples to avoid data leakage.
arXiv Detail & Related papers (2024-10-02T09:11:10Z) - RAGLAB: A Modular and Research-Oriented Unified Framework for Retrieval-Augmented Generation [54.707460684650584]
Large Language Models (LLMs) demonstrate human-level capabilities in dialogue, reasoning, and knowledge retention.
Current research addresses this bottleneck by equipping LLMs with external knowledge, a technique known as Retrieval Augmented Generation (RAG)
RAGLAB is a modular and research-oriented open-source library that reproduces 6 existing algorithms and provides a comprehensive ecosystem for investigating RAG algorithms.
arXiv Detail & Related papers (2024-08-21T07:20:48Z) - FlowBench: Revisiting and Benchmarking Workflow-Guided Planning for LLM-based Agents [64.1759086221016]
We present FlowBench, the first benchmark for workflow-guided planning.
FlowBench covers 51 different scenarios from 6 domains, with knowledge presented in diverse formats.
Results indicate that current LLM agents need considerable improvements for satisfactory planning.
arXiv Detail & Related papers (2024-06-21T06:13:00Z) - DevEval: Evaluating Code Generation in Practical Software Projects [52.16841274646796]
We propose a new benchmark named DevEval, aligned with Developers' experiences in practical projects.
DevEval is collected through a rigorous pipeline, containing 2,690 samples from 119 practical projects.
We assess five popular LLMs on DevEval and reveal their actual abilities in code generation.
arXiv Detail & Related papers (2024-01-12T06:51:30Z) - Code Ownership in Open-Source AI Software Security [18.779538756226298]
We use code ownership metrics to investigate the correlation with latent vulnerabilities across five prominent open-source AI software projects.
The findings suggest a positive relationship between high-level ownership (characterised by a limited number of minor contributors) and a decrease in vulnerabilities.
With these novel code ownership metrics, we have implemented a Python-based command-line application to aid project curators and quality assurance professionals in evaluating and benchmarking their on-site projects.
arXiv Detail & Related papers (2023-12-18T00:37:29Z) - Individual context-free online community health indicators fail to identify open source software sustainability [3.192308005611312]
We monitored thirty-eight open source projects over the period of a year.
None of the projects were abandoned during this period, and only one project entered a planned shutdown.
Results were highly heterogeneous, showing little commonality across documentation, mean response times for issues and code contributions, and available funding/staffing resources.
arXiv Detail & Related papers (2023-09-21T14:41:41Z) - How Early Participation Determines Long-Term Sustained Activity in
GitHub Projects? [20.236570418427533]
We aim to explore the relationship between early participation factors and long-term project sustainability.
We leverage a novel methodology combining the Blumberg model of performance and machine learning to predict the sustainability of 290,255 GitHub projects.
We quantitatively show that early participants have a positive effect on project's future sustained activity if they have prior experience in OSS project incubation.
arXiv Detail & Related papers (2023-08-11T08:24:41Z) - Fast Optimal Locally Private Mean Estimation via Random Projections [58.603579803010796]
We study the problem of locally private mean estimation of high-dimensional vectors in the Euclidean ball.
We propose a new algorithmic framework, ProjUnit, for private mean estimation.
Our framework is deceptively simple: each randomizer projects its input to a random low-dimensional subspace, normalizes the result, and then runs an optimal algorithm.
arXiv Detail & Related papers (2023-06-07T14:07:35Z) - Leveraging Data Mining Algorithms to Recommend Source Code Changes [7.959841510571622]
This paper proposes an automatic method for recommending source code changes using four data mining algorithms.
We compare the algorithms in terms of performance (Precision, Recall and F-measure) and execution time.
Apriori seems appropriate for large-scale projects, whereas Eclat appears to be suitable for small-scale projects.
arXiv Detail & Related papers (2023-04-29T18:38:23Z) - Scalable Batch Acquisition for Deep Bayesian Active Learning [70.68403899432198]
In deep active learning, it is important to choose multiple examples to markup at each step.
Existing solutions to this problem, such as BatchBALD, have significant limitations in selecting a large number of examples.
We present the Large BatchBALD algorithm, which aims to achieve comparable quality while being more computationally efficient.
arXiv Detail & Related papers (2023-01-13T11:45:17Z) - Big Data = Big Insights? Operationalising Brooks' Law in a Massive
GitHub Data Set [1.1470070927586014]
We study challenges that can explain the disagreement between recent studies of developer productivity in massive repository data.
We provide, to the best of our knowledge, the largest, curated corpus of GitHub projects tailored to investigate the influence of team size and collaboration patterns on individual and collective productivity.
arXiv Detail & Related papers (2022-01-12T17:25:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.