Related papers: Factoring Expertise, Workload, and Turnover into Code Review Recommendation

Factoring Expertise, Workload, and Turnover into Code Review Recommendation

URL: http://arxiv.org/abs/2312.17236v1
Date: Thu, 28 Dec 2023 18:58:06 GMT
Title: Factoring Expertise, Workload, and Turnover into Code Review Recommendation
Authors: Fahimeh Hajari, Samaneh Malmir, Ehsan Mirsaeedi, Peter C. Rigby
Abstract summary: We show that code review natural spreads knowledge thereby reducing the files at risk to turnover. We develop novel recommenders to understand their impact on the level of expertise during review. We are able to globally increase expertise during reviews, +3%, reduce workload concentration, -12%, and reduce the files at risk, -28%.
Score: 4.492444446637857
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Developer turnover is inevitable on software projects and leads to knowledge loss, a reduction in productivity, and an increase in defects. Mitigation strategies to deal with turnover tend to disrupt and increase workloads for developers. In this work, we suggest that through code review recommendation we can distribute knowledge and mitigate turnover while more evenly distributing review workload. We conduct historical analyses to understand the natural concentration of review workload and the degree of knowledge spreading that is inherent in code review. Even though review workload is highly concentrated, we show that code review natural spreads knowledge thereby reducing the files at risk to turnover. Using simulation, we evaluate existing code review recommenders and develop novel recommenders to understand their impact on the level of expertise during review, the workload of reviewers, and the files at risk to turnover. Our simulations use seeded random replacement of reviewers to allow us to compare the reviewer recommenders without the confounding variation of different reviewers being replaced for each recommender. Combining recommenders, we develop the SofiaWL recommender that suggests experts with low active review workload when none of the files under review are known by only one developer. In contrast, when knowledge is concentrated on one developer, it sends the review to other reviewers to spread knowledge. For the projects we study, we are able to globally increase expertise during reviews, +3%, reduce workload concentration, -12%, and reduce the files at risk, -28%. We make our scripts and data available in our replication package. Developers can optimize for a particular outcome measure based on the needs of their project, or use our GitHub bot to automatically balance the outcomes.

Related papers

LazyReview A Dataset for Uncovering Lazy Thinking in NLP Peer Reviews [74.87393214734114]
This work introduces LazyReview, a dataset of peer-review sentences annotated with fine-grained lazy thinking categories. Large Language Models (LLMs) struggle to detect these instances in a zero-shot setting. instruction-based fine-tuning on our dataset significantly boosts performance by 10-20 performance points.
arXiv Detail & Related papers (2025-04-15T10:07:33Z)
Are We There Yet? Revealing the Risks of Utilizing Large Language Models in Scholarly Peer Review [66.73247554182376]
Large language models (LLMs) have led to their integration into peer review. The unchecked adoption of LLMs poses significant risks to the integrity of the peer review system. We show that manipulating 5% of the reviews could potentially cause 12% of the papers to lose their position in the top 30% rankings.
arXiv Detail & Related papers (2024-12-02T16:55:03Z)
Deep Learning-based Code Reviews: A Paradigm Shift or a Double-Edged Sword? [14.970843824847956]
We run a controlled experiment with 29 experts who reviewed different programs with/without the support of an automatically generated code review. We show that reviewers consider valid most of the issues automatically identified by the LLM and that the availability of an automated review as a starting point strongly influences their behavior. The reviewers who started from an automated review identified a higher number of low-severity issues while, however, not identifying more high-severity issues as compared to a completely manual process.
arXiv Detail & Related papers (2024-11-18T09:24:01Z)
Understanding Code Understandability Improvements in Code Reviews [79.16476505761582]
We analyzed 2,401 code review comments from Java open-source projects on GitHub. 83.9% of suggestions for improvement were accepted and integrated, with fewer than 1% later reverted.
arXiv Detail & Related papers (2024-10-29T12:21:23Z)
Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion? [60.84912551069379]
We present the Code-Development Benchmark (Codev-Bench), a fine-grained, real-world, repository-level, and developer-centric evaluation framework. Codev-Agent is an agent-based system that automates repository crawling, constructs execution environments, extracts dynamic calling chains from existing unit tests, and generates new test samples to avoid data leakage.
arXiv Detail & Related papers (2024-10-02T09:11:10Z)
Leveraging Reviewer Experience in Code Review Comment Generation [11.224317228559038]
We train deep learning models to imitate human reviewers in providing natural language code reviews. The quality of the model generated reviews remain sub-optimal due to the quality of the open-source code review data used in model training. We propose a suite of experience-aware training methods that utilise the reviewers' past authoring and reviewing experiences as signals for review quality.
arXiv Detail & Related papers (2024-09-17T07:52:50Z)
Improving Automated Code Reviews: Learning from Experience [12.573740138977065]
This study investigates whether higher-quality reviews can be generated from automated code review models. We find that experience-aware oversampling can increase the correctness, level of information, and meaningfulness of reviews.
arXiv Detail & Related papers (2024-02-06T07:48:22Z)
Improving Code Reviewer Recommendation: Accuracy, Latency, Workload, and Bystanders [6.538051328482194]
We build upon the recommender that has been in production since 2018 RevRecV1. We find that reviewers were being assigned based on prior authorship of files. Having an individual who is responsible for the review, reduces the time take for reviews by -11%.
arXiv Detail & Related papers (2023-12-28T17:55:13Z)
ReviewRanker: A Semi-Supervised Learning Based Approach for Code Review Quality Estimation [0.6895577977557867]
Inspection of review process effectiveness and continuous improvement can boost development productivity. We propose a semi-supervised learning based system ReviewRanker which is aimed at assigning each code review a confidence score. Our proposed method is trained based on simple and and well defined labels provided by developers.
arXiv Detail & Related papers (2023-07-08T15:37:48Z)
Using Large-scale Heterogeneous Graph Representation Learning for Code Review Recommendations [7.260832843615661]
We present CORAL, a novel approach to reviewer recommendation. We use a socio-technical graph built from the rich set of entities. We show that CORAL is able to model the manual history of reviewer selection remarkably well.
arXiv Detail & Related papers (2022-02-04T20:58:54Z)
ProtoTransformer: A Meta-Learning Approach to Providing Student Feedback [54.142719510638614]
In this paper, we frame the problem of providing feedback as few-shot classification. A meta-learner adapts to give feedback to student code on a new programming question from just a few examples by instructors. Our approach was successfully deployed to deliver feedback to 16,000 student exam-solutions in a programming course offered by a tier 1 university.
arXiv Detail & Related papers (2021-07-23T22:41:28Z)
Can We Automate Scientific Reviewing? [89.50052670307434]
We discuss the possibility of using state-of-the-art natural language processing (NLP) models to generate first-pass peer reviews for scientific papers. We collect a dataset of papers in the machine learning domain, annotate them with different aspects of content covered in each review, and train targeted summarization models that take in papers to generate reviews. Comprehensive experimental results show that system-generated reviews tend to touch upon more aspects of the paper than human-written reviews.
arXiv Detail & Related papers (2021-01-30T07:16:53Z)
Showing Your Work Doesn't Always Work [73.63200097493576]
"Show Your Work: Improved Reporting of Experimental Results" advocates for reporting the expected validation effectiveness of the best-tuned model. We analytically show that their estimator is biased and uses error-prone assumptions. We derive an unbiased alternative and bolster our claims with empirical evidence from statistical simulation.
arXiv Detail & Related papers (2020-04-28T17:59:01Z)
Automating App Review Response Generation [67.58267006314415]
We propose a novel approach RRGen that automatically generates review responses by learning knowledge relations between reviews and their responses. Experiments on 58 apps and 309,246 review-response pairs highlight that RRGen outperforms the baselines by at least 67.4% in terms of BLEU-4.
arXiv Detail & Related papers (2020-02-10T05:23:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.