A Large-Scale Study on Developer Engagement and Expertise in Configurable Software System Projects
- URL: http://arxiv.org/abs/2508.18070v1
- Date: Mon, 25 Aug 2025 14:29:20 GMT
- Title: A Large-Scale Study on Developer Engagement and Expertise in Configurable Software System Projects
- Authors: Karolina M. Milano, Wesley K. G. Assunção, Bruno B. P. Cafeo,
- Abstract summary: This study investigates developers' engagement with variable versus mandatory code, the concentration of variable code, workload, and the effectiveness of expertise metrics in CSS projects.<n>Results show that 59% of developers never modified variable code, while about 17% were responsible for developing and maintaining 83% of it.<n>This indicates a high concentration of variable code expertise among a few developers, suggesting that task assignments should prioritize these specialists.
- Score: 3.8994048150471166
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Modern systems operate in multiple contexts making variability a fundamental aspect of Configurable Software Systems (CSSs). Variability, implemented via pre-processor directives (e.g., #ifdef blocks) interleaved with other code and spread across files, complicates maintenance and increases error risk. Despite its importance, little is known about how variable code is distributed among developers or whether conventional expertise metrics adequately capture variable code proficiency. This study investigates developers' engagement with variable versus mandatory code, the concentration of variable code workload, and the effectiveness of expertise metrics in CSS projects. We mined repositories of 25 CSS projects, analyzing 450,255 commits from 9,678 developers. Results show that 59% of developers never modified variable code, while about 17% were responsible for developing and maintaining 83% of it. This indicates a high concentration of variable code expertise among a few developers, suggesting that task assignments should prioritize these specialists. Moreover, conventional expertise metrics performed poorly--achieving only around 55% precision and 50% recall in identifying developers engaged with variable code. Our findings highlight an unbalanced distribution of variable code responsibilities and underscore the need to refine expertise metrics to better support task assignments in CSS projects, thereby promoting a more equitable workload distribution.
Related papers
- Agentic Much? Adoption of Coding Agents on GitHub [6.395990525268647]
We present the first large-scale study of the adoption of coding agents on GitHub.<n>We find an estimated adoption rate of 15.85%--22.60%, which is very high for a technology only a few months old--and increasing.<n>At the commit level, we find that commits assisted by coding agents are larger than commits only authored by human developers.
arXiv Detail & Related papers (2026-01-26T10:28:10Z) - CodeClash: Benchmarking Goal-Oriented Software Engineering [63.65464283837602]
We run 1680 tournaments (25,200 rounds total) to evaluate 8 LMs across 6 arenas.<n>Our results reveal that while models exhibit diverse development styles, they share fundamental limitations in strategic reasoning.<n>We open-source CodeClash to advance the study of autonomous, goal-oriented code development.
arXiv Detail & Related papers (2025-11-02T07:42:51Z) - RedCode: Risky Code Execution and Generation Benchmark for Code Agents [50.81206098588923]
RedCode is a benchmark for risky code execution and generation.
RedCode-Exec provides challenging prompts that could lead to risky code execution.
RedCode-Gen provides 160 prompts with function signatures and docstrings as input to assess whether code agents will follow instructions.
arXiv Detail & Related papers (2024-11-12T13:30:06Z) - Understanding Code Understandability Improvements in Code Reviews [79.16476505761582]
We analyzed 2,401 code review comments from Java open-source projects on GitHub.
83.9% of suggestions for improvement were accepted and integrated, with fewer than 1% later reverted.
arXiv Detail & Related papers (2024-10-29T12:21:23Z) - Navigating Expertise in Configurable Software Systems through the Maze
of Variability [0.0]
This research study investigates the distribution of development efforts in CSS.
It also examines the engagement of designated experts with variable code in their assigned files.
arXiv Detail & Related papers (2024-01-19T14:03:33Z) - DevEval: Evaluating Code Generation in Practical Software Projects [52.16841274646796]
We propose a new benchmark named DevEval, aligned with Developers' experiences in practical projects.
DevEval is collected through a rigorous pipeline, containing 2,690 samples from 119 practical projects.
We assess five popular LLMs on DevEval and reveal their actual abilities in code generation.
arXiv Detail & Related papers (2024-01-12T06:51:30Z) - Who is the Real Hero? Measuring Developer Contribution via
Multi-dimensional Data Integration [8.735393610868435]
We propose CValue, a multidimensional information fusion-based approach to measure developer contributions.
CValue extracts both syntax and semantic information from the source code changes in four dimensions.
It fuses the information to produce the contribution score for each of the commits in the projects.
arXiv Detail & Related papers (2023-08-17T13:57:44Z) - CONCORD: Clone-aware Contrastive Learning for Source Code [64.51161487524436]
Self-supervised pre-training has gained traction for learning generic code representations valuable for many downstream SE tasks.
We argue that it is also essential to factor in how developers code day-to-day for general-purpose representation learning.
In particular, we propose CONCORD, a self-supervised, contrastive learning strategy to place benign clones closer in the representation space while moving deviants further apart.
arXiv Detail & Related papers (2023-06-05T20:39:08Z) - Trusting code in the wild: A social network-based centrality rating for
developers in the Rust ecosystem [1.3581810800092387]
This study builds a social network of 6,949 developers across the collaboration activity from 1,644 Rust packages.
We evaluate if code coming from a developer with a higher centrality rating is likely to be accepted with lesser scrutiny by the downstream projects.
arXiv Detail & Related papers (2023-05-31T23:24:03Z) - Generation Probabilities Are Not Enough: Uncertainty Highlighting in AI Code Completions [54.55334589363247]
We study whether conveying information about uncertainty enables programmers to more quickly and accurately produce code.
We find that highlighting tokens with the highest predicted likelihood of being edited leads to faster task completion and more targeted edits.
arXiv Detail & Related papers (2023-02-14T18:43:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.