Who is the Real Hero? Measuring Developer Contribution via
Multi-dimensional Data Integration
- URL: http://arxiv.org/abs/2308.08991v2
- Date: Thu, 31 Aug 2023 07:42:39 GMT
- Title: Who is the Real Hero? Measuring Developer Contribution via
Multi-dimensional Data Integration
- Authors: Yuqiang Sun, Zhengzi Xu, Chengwei Liu, Yiran Zhang, Yang Liu
- Abstract summary: We propose CValue, a multidimensional information fusion-based approach to measure developer contributions.
CValue extracts both syntax and semantic information from the source code changes in four dimensions.
It fuses the information to produce the contribution score for each of the commits in the projects.
- Score: 8.735393610868435
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Proper incentives are important for motivating developers in open-source
communities, which is crucial for maintaining the development of open-source
software healthy. To provide such incentives, an accurate and objective
developer contribution measurement method is needed. However, existing methods
rely heavily on manual peer review, lacking objectivity and transparency. The
metrics of some automated works about effort estimation use only syntax-level
or even text-level information, such as changed lines of code, which lack
robustness. Furthermore, some works about identifying core developers provide
only a qualitative understanding without a quantitative score or have some
project-specific parameters, which makes them not practical in real-world
projects. To this end, we propose CValue, a multidimensional information
fusion-based approach to measure developer contributions. CValue extracts both
syntax and semantic information from the source code changes in four
dimensions: modification amount, understandability, inter-function and
intra-function impact of modification. It fuses the information to produce the
contribution score for each of the commits in the projects. Experimental
results show that CValue outperforms other approaches by 19.59% on 10
real-world projects with manually labeled ground truth. We validated and proved
that the performance of CValue, which takes 83.39 seconds per commit, is
acceptable to be applied in real-world projects. Furthermore, we performed a
large-scale experiment on 174 projects and detected 2,282 developers having
inflated commits. Of these, 2,050 developers did not make any syntax
contribution; and 103 were identified as bots.
Related papers
- Understanding Code Understandability Improvements in Code Reviews [79.16476505761582]
We analyzed 2,401 code review comments from Java open-source projects on GitHub.
83.9% of suggestions for improvement were accepted and integrated, with fewer than 1% later reverted.
arXiv Detail & Related papers (2024-10-29T12:21:23Z) - Towards Realistic Evaluation of Commit Message Generation by Matching Online and Offline Settings [77.20838441870151]
Commit message generation is a crucial task in software engineering that is challenging to evaluate correctly.
We use an online metric - the number of edits users introduce before committing the generated messages to the VCS - to select metrics for offline experiments.
Our results indicate that edit distance exhibits the highest correlation, whereas commonly used similarity metrics such as BLEU and METEOR demonstrate low correlation.
arXiv Detail & Related papers (2024-10-15T20:32:07Z) - Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion? [60.84912551069379]
We present the Code-Development Benchmark (Codev-Bench), a fine-grained, real-world, repository-level, and developer-centric evaluation framework.
Codev-Agent is an agent-based system that automates repository crawling, constructs execution environments, extracts dynamic calling chains from existing unit tests, and generates new test samples to avoid data leakage.
arXiv Detail & Related papers (2024-10-02T09:11:10Z) - QualEval: Qualitative Evaluation for Model Improvement [82.73561470966658]
We propose QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement.
QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights.
We demonstrate that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative.
arXiv Detail & Related papers (2023-11-06T00:21:44Z) - Generation Probabilities Are Not Enough: Uncertainty Highlighting in AI Code Completions [54.55334589363247]
We study whether conveying information about uncertainty enables programmers to more quickly and accurately produce code.
We find that highlighting tokens with the highest predicted likelihood of being edited leads to faster task completion and more targeted edits.
arXiv Detail & Related papers (2023-02-14T18:43:34Z) - Productivity Assessment of Neural Code Completion [4.821593904732654]
We ask users of GitHub Copilot about its impact on their productivity, and seek to find a reflection of their perception in directly measurable user data.
We find that the rate with which shown suggestions are accepted, rather than more specific metrics regarding the persistence of completions in the code over time, drives developers' perception of productivity.
arXiv Detail & Related papers (2022-05-13T09:53:25Z) - Nemo: Guiding and Contextualizing Weak Supervision for Interactive Data
Programming [77.38174112525168]
We present Nemo, an end-to-end interactive Supervision system that improves overall productivity of WS learning pipeline by an average 20% (and up to 47% in one task) compared to the prevailing WS supervision approach.
arXiv Detail & Related papers (2022-03-02T19:57:32Z) - Automated Mapping of Vulnerability Advisories onto their Fix Commits in
Open Source Repositories [7.629717457706326]
We present an approach that combines practical experience and machine-learning (ML)
An advisory record containing key information about a vulnerability is extracted from an advisory.
A subset of candidate fix commits is obtained from the source code repository of the affected project.
arXiv Detail & Related papers (2021-03-24T17:50:35Z) - The Mind Is a Powerful Place: How Showing Code Comprehensibility Metrics
Influences Code Understanding [10.644832702859484]
We investigate whether a displayed metric value for source code comprehensibility anchors developers in their subjective rating of source code comprehensibility.
We found that the displayed value of a comprehensibility metric has a significant and large anchoring effect on a developer's code comprehensibility rating.
arXiv Detail & Related papers (2020-12-16T14:27:45Z) - Representation of Developer Expertise in Open Source Software [12.583969739954526]
We use the World of Code infrastructure to extract the complete set of APIs in the files changed by open source developers.
We then employ Doc2Vec embeddings for vector representations of APIs, developers, and projects.
We evaluate if these embeddings reflect the postulated topology of the Skill Space.
arXiv Detail & Related papers (2020-05-20T16:36:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.