Who is the Real Hero? Measuring Developer Contribution via
Multi-dimensional Data Integration
- URL: http://arxiv.org/abs/2308.08991v2
- Date: Thu, 31 Aug 2023 07:42:39 GMT
- Title: Who is the Real Hero? Measuring Developer Contribution via
Multi-dimensional Data Integration
- Authors: Yuqiang Sun, Zhengzi Xu, Chengwei Liu, Yiran Zhang, Yang Liu
- Abstract summary: We propose CValue, a multidimensional information fusion-based approach to measure developer contributions.
CValue extracts both syntax and semantic information from the source code changes in four dimensions.
It fuses the information to produce the contribution score for each of the commits in the projects.
- Score: 8.735393610868435
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Proper incentives are important for motivating developers in open-source
communities, which is crucial for maintaining the development of open-source
software healthy. To provide such incentives, an accurate and objective
developer contribution measurement method is needed. However, existing methods
rely heavily on manual peer review, lacking objectivity and transparency. The
metrics of some automated works about effort estimation use only syntax-level
or even text-level information, such as changed lines of code, which lack
robustness. Furthermore, some works about identifying core developers provide
only a qualitative understanding without a quantitative score or have some
project-specific parameters, which makes them not practical in real-world
projects. To this end, we propose CValue, a multidimensional information
fusion-based approach to measure developer contributions. CValue extracts both
syntax and semantic information from the source code changes in four
dimensions: modification amount, understandability, inter-function and
intra-function impact of modification. It fuses the information to produce the
contribution score for each of the commits in the projects. Experimental
results show that CValue outperforms other approaches by 19.59% on 10
real-world projects with manually labeled ground truth. We validated and proved
that the performance of CValue, which takes 83.39 seconds per commit, is
acceptable to be applied in real-world projects. Furthermore, we performed a
large-scale experiment on 174 projects and detected 2,282 developers having
inflated commits. Of these, 2,050 developers did not make any syntax
contribution; and 103 were identified as bots.
Related papers
- FeatureBench: Benchmarking Agentic Coding for Complex Feature Development [42.26354337364403]
FeatureBench is a benchmark designed to evaluate agentic coding performance in end-to-end, feature-oriented software development.<n>It incorporates an execution-based evaluation protocol and a scalable test-driven method that automatically derives tasks from code repositories with minimal human effort.<n> Empirical evaluation reveals that the state-of-the-art agentic model, such as Claude 4.5 Opus, achieves a 74.4% resolved rate on SWE-bench.
arXiv Detail & Related papers (2026-02-11T16:06:32Z) - Failure-Aware Enhancements for Large Language Model (LLM) Code Generation: An Empirical Study on Decision Framework [0.26508608365976566]
In an empirical study of 25 GitHub projects, we found that progressive prompting achieves 96.9% average task completion.<n>Self-critique succeeds on code-reviewable logic errors but fails completely on external service integration.<n>RAG achieves highest completion across all failure types with superior efficiency.
arXiv Detail & Related papers (2026-02-02T23:08:03Z) - Why Authors and Maintainers Link (or Don't Link) Their PyPI Libraries to Code Repositories and Donation Platforms [83.16077040470975]
Metadata of libraries on the Python Package Index (PyPI) plays a critical role in supporting the transparency, trust, and sustainability of open-source libraries.<n>This paper presents a large-scale empirical study combining two targeted surveys sent to 50,000 PyPI authors and maintainers.<n>We analyze more than 1,400 responses using large language model (LLM)-based topic modeling to uncover key motivations and barriers related to linking repositories and donation platforms.
arXiv Detail & Related papers (2026-01-21T16:13:57Z) - On the Use of Agentic Coding: An Empirical Study of Pull Requests on GitHub [6.7302091035327285]
Large language models (LLMs) are increasingly being integrated into software development processes.<n>The ability to generate code and submit pull requests with minimal human intervention, through the use of autonomous AI agents, is poised to become a standard practice.<n>We empirically study 567 GitHub pull requests (PRs) generated using Claude Code, an agentic coding tool, across 157 open-source projects.
arXiv Detail & Related papers (2025-09-18T08:48:32Z) - CoCoNUTS: Concentrating on Content while Neglecting Uninformative Textual Styles for AI-Generated Peer Review Detection [60.52240468810558]
We introduce CoCoNUTS, a content-oriented benchmark built upon a fine-grained dataset of AI-generated peer reviews.<n>We also develop CoCoDet, an AI review detector via a multi-task learning framework, to achieve more accurate and robust detection of AI involvement in review content.
arXiv Detail & Related papers (2025-08-28T06:03:11Z) - A Large-Scale Study on Developer Engagement and Expertise in Configurable Software System Projects [3.8994048150471166]
This study investigates developers' engagement with variable versus mandatory code, the concentration of variable code, workload, and the effectiveness of expertise metrics in CSS projects.<n>Results show that 59% of developers never modified variable code, while about 17% were responsible for developing and maintaining 83% of it.<n>This indicates a high concentration of variable code expertise among a few developers, suggesting that task assignments should prioritize these specialists.
arXiv Detail & Related papers (2025-08-25T14:29:20Z) - Code with Me or for Me? How Increasing AI Automation Transforms Developer Workflows [66.1850490474361]
We conduct the first academic study to explore developer interactions with coding agents.<n>We evaluate two leading copilot and agentic coding assistants, GitHub Copilot and OpenHands.<n>Our results show agents have the potential to assist developers in ways that surpass copilots.
arXiv Detail & Related papers (2025-07-10T20:12:54Z) - ArtifactsBench: Bridging the Visual-Interactive Gap in LLM Code Generation Evaluation [51.297873393639456]
ArtifactsBench is a framework for automated visual code generation evaluation.<n>Our framework renders each generated artifact and captures its dynamic behavior through temporal screenshots.<n>We construct a new benchmark of 1,825 diverse tasks and evaluate over 30 leading Large Language Models.
arXiv Detail & Related papers (2025-07-07T12:53:00Z) - SolBench: A Dataset and Benchmark for Evaluating Functional Correctness in Solidity Code Completion and Repair [51.0686873716938]
We introduce SolBench, a benchmark for evaluating the functional correctness of Solidity smart contracts generated by code completion models.
We propose a Retrieval-Augmented Code Repair framework to verify functional correctness of smart contracts.
Results show that code repair and retrieval techniques effectively enhance the correctness of smart contract completion while reducing computational costs.
arXiv Detail & Related papers (2025-03-03T01:55:20Z) - Understanding Code Understandability Improvements in Code Reviews [79.16476505761582]
We analyzed 2,401 code review comments from Java open-source projects on GitHub.
83.9% of suggestions for improvement were accepted and integrated, with fewer than 1% later reverted.
arXiv Detail & Related papers (2024-10-29T12:21:23Z) - Towards Realistic Evaluation of Commit Message Generation by Matching Online and Offline Settings [77.20838441870151]
Commit message generation is a crucial task in software engineering that is challenging to evaluate correctly.
We use an online metric - the number of edits users introduce before committing the generated messages to the VCS - to select metrics for offline experiments.
Our results indicate that edit distance exhibits the highest correlation, whereas commonly used similarity metrics such as BLEU and METEOR demonstrate low correlation.
arXiv Detail & Related papers (2024-10-15T20:32:07Z) - Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion? [60.84912551069379]
We present the Code-Development Benchmark (Codev-Bench), a fine-grained, real-world, repository-level, and developer-centric evaluation framework.
Codev-Agent is an agent-based system that automates repository crawling, constructs execution environments, extracts dynamic calling chains from existing unit tests, and generates new test samples to avoid data leakage.
arXiv Detail & Related papers (2024-10-02T09:11:10Z) - QualEval: Qualitative Evaluation for Model Improvement [82.73561470966658]
We propose QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement.
QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights.
We demonstrate that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative.
arXiv Detail & Related papers (2023-11-06T00:21:44Z) - Generation Probabilities Are Not Enough: Uncertainty Highlighting in AI Code Completions [54.55334589363247]
We study whether conveying information about uncertainty enables programmers to more quickly and accurately produce code.
We find that highlighting tokens with the highest predicted likelihood of being edited leads to faster task completion and more targeted edits.
arXiv Detail & Related papers (2023-02-14T18:43:34Z) - Productivity Assessment of Neural Code Completion [4.821593904732654]
We ask users of GitHub Copilot about its impact on their productivity, and seek to find a reflection of their perception in directly measurable user data.
We find that the rate with which shown suggestions are accepted, rather than more specific metrics regarding the persistence of completions in the code over time, drives developers' perception of productivity.
arXiv Detail & Related papers (2022-05-13T09:53:25Z) - Nemo: Guiding and Contextualizing Weak Supervision for Interactive Data
Programming [77.38174112525168]
We present Nemo, an end-to-end interactive Supervision system that improves overall productivity of WS learning pipeline by an average 20% (and up to 47% in one task) compared to the prevailing WS supervision approach.
arXiv Detail & Related papers (2022-03-02T19:57:32Z) - Automated Mapping of Vulnerability Advisories onto their Fix Commits in
Open Source Repositories [7.629717457706326]
We present an approach that combines practical experience and machine-learning (ML)
An advisory record containing key information about a vulnerability is extracted from an advisory.
A subset of candidate fix commits is obtained from the source code repository of the affected project.
arXiv Detail & Related papers (2021-03-24T17:50:35Z) - The Mind Is a Powerful Place: How Showing Code Comprehensibility Metrics
Influences Code Understanding [10.644832702859484]
We investigate whether a displayed metric value for source code comprehensibility anchors developers in their subjective rating of source code comprehensibility.
We found that the displayed value of a comprehensibility metric has a significant and large anchoring effect on a developer's code comprehensibility rating.
arXiv Detail & Related papers (2020-12-16T14:27:45Z) - Representation of Developer Expertise in Open Source Software [12.583969739954526]
We use the World of Code infrastructure to extract the complete set of APIs in the files changed by open source developers.
We then employ Doc2Vec embeddings for vector representations of APIs, developers, and projects.
We evaluate if these embeddings reflect the postulated topology of the Skill Space.
arXiv Detail & Related papers (2020-05-20T16:36:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.