Related papers: Improving Code Reviewer Recommendation: Accuracy, Latency, Workload, and Bystanders

Improving Code Reviewer Recommendation: Accuracy, Latency, Workload, and Bystanders

URL: http://arxiv.org/abs/2312.17169v1
Date: Thu, 28 Dec 2023 17:55:13 GMT
Title: Improving Code Reviewer Recommendation: Accuracy, Latency, Workload, and Bystanders
Authors: Peter C. Rigby, Seth Rogers, Sadruddin Saleem, Parth Suresh, Daniel Suskin, Patrick Riggs, Chandra Maddila, Nachiappan Nagappan
Abstract summary: We build upon the recommender that has been in production since 2018 RevRecV1. We find that reviewers were being assigned based on prior authorship of files. Having an individual who is responsible for the review, reduces the time take for reviews by -11%.
Score: 6.538051328482194
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Code review ensures that a peer engineer manually examines the code before it is integrated and released into production. At Meta, we develop a wide range of software at scale, from social networking to software development infrastructure, such as calendar and meeting tools to continuous integration. We are constantly improving our code review system, and in this work we describe a series of experiments that were conducted across 10's of thousands of engineers and 100's of thousands of reviews. We build upon the recommender that has been in production since 2018, RevRecV1. We found that reviewers were being assigned based on prior authorship of files. We reviewed the literature for successful features and experimented with them with RevRecV2 in production. The most important feature in our new model was the familiarity of the author and reviewer, we saw an overall improvement in accuracy of 14 percentage points. Prior research has shown that reviewer workload is skewed. To balance workload, we divide the reviewer score from RevRecV2 by each candidate reviewers workload. We experimented with multiple types of workload to develop RevRecWL. We find that reranking candidate reviewers by workload often leads to a reviewers with lower workload being selected by authors. The bystander effect can occur when a team of reviewers is assigned the review. We mitigate the bystander effect by randomly assigning one of the recommended reviewers. Having an individual who is responsible for the review, reduces the time take for reviews by -11%.

Related papers

Agentic Program Repair from Test Failures at Scale: A Neuro-symbolic approach with static analysis and test execution feedback [11.070932612938154]
We develop an Engineering Agent that fixes the source code based on test failures at scale across diverse software offerings.<n>We provide feedback to the agent through static analysis and test failures so it can refine its solution.<n>In a three month period, 80% of the generated fixes were reviewed, of which 31.5% were landed.
arXiv Detail & Related papers (2025-07-24T19:12:32Z)
LazyReview A Dataset for Uncovering Lazy Thinking in NLP Peer Reviews [74.87393214734114]
This work introduces LazyReview, a dataset of peer-review sentences annotated with fine-grained lazy thinking categories. Large Language Models (LLMs) struggle to detect these instances in a zero-shot setting. instruction-based fine-tuning on our dataset significantly boosts performance by 10-20 performance points.
arXiv Detail & Related papers (2025-04-15T10:07:33Z)
Can LLM feedback enhance review quality? A randomized study of 20K reviews at ICLR 2025 [115.86204862475864]
Review Feedback Agent provides automated feedback on vague comments, content misunderstandings, and unprofessional remarks to reviewers. It was implemented at ICLR 2025 as a large randomized control study. 27% of reviewers who received feedback updated their reviews, and over 12,000 feedback suggestions from the agent were incorporated by those reviewers.
arXiv Detail & Related papers (2025-04-13T22:01:25Z)
Deep Learning-based Code Reviews: A Paradigm Shift or a Double-Edged Sword? [14.970843824847956]
We run a controlled experiment with 29 experts who reviewed different programs with/without the support of an automatically generated code review. We show that reviewers consider valid most of the issues automatically identified by the LLM and that the availability of an automated review as a starting point strongly influences their behavior. The reviewers who started from an automated review identified a higher number of low-severity issues while, however, not identifying more high-severity issues as compared to a completely manual process.
arXiv Detail & Related papers (2024-11-18T09:24:01Z)
CRScore: Grounding Automated Evaluation of Code Review Comments in Code Claims and Smells [15.66562304661042]
CRScore is a reference-free metric to measure dimensions of review quality like conciseness, comprehensiveness, and relevance. We demonstrate that CRScore can produce valid, fine-grained scores of review quality that have the greatest alignment with human judgment among open source metrics. We also release a corpus of 2.9k human-annotated review quality scores for machine-generated and GitHub review comments to support the development of automated metrics.
arXiv Detail & Related papers (2024-09-29T21:53:18Z)
Leveraging Reviewer Experience in Code Review Comment Generation [11.224317228559038]
We train deep learning models to imitate human reviewers in providing natural language code reviews. The quality of the model generated reviews remain sub-optimal due to the quality of the open-source code review data used in model training. We propose a suite of experience-aware training methods that utilise the reviewers' past authoring and reviewing experiences as signals for review quality.
arXiv Detail & Related papers (2024-09-17T07:52:50Z)
Take Care of Your Prompt Bias! Investigating and Mitigating Prompt Bias in Factual Knowledge Extraction [56.17020601803071]
Recent research shows that pre-trained language models (PLMs) suffer from "prompt bias" in factual knowledge extraction. This paper aims to improve the reliability of existing benchmarks by thoroughly investigating and mitigating prompt bias.
arXiv Detail & Related papers (2024-03-15T02:04:35Z)
Factoring Expertise, Workload, and Turnover into Code Review Recommendation [4.492444446637857]
We show that code review natural spreads knowledge thereby reducing the files at risk to turnover. We develop novel recommenders to understand their impact on the level of expertise during review. We are able to globally increase expertise during reviews, +3%, reduce workload concentration, -12%, and reduce the files at risk, -28%.
arXiv Detail & Related papers (2023-12-28T18:58:06Z)
Selecting Learnable Training Samples is All DETRs Need in Crowded Pedestrian Detection [72.97320260601347]
In crowded pedestrian detection, the performance of DETRs is still unsatisfactory due to the inappropriate sample selection method. We propose Sample Selection for Crowded Pedestrians, which consists of the constraint-guided label assignment scheme (CGLA) Experimental results show that the proposed SSCP effectively improves the baselines without introducing any overhead in inference.
arXiv Detail & Related papers (2023-05-18T08:28:01Z)
Sequential Kernelized Independence Testing [77.237958592189]
We design sequential kernelized independence tests inspired by kernelized dependence measures.<n>We demonstrate the power of our approaches on both simulated and real data.
arXiv Detail & Related papers (2022-12-14T18:08:42Z)
Using Large-scale Heterogeneous Graph Representation Learning for Code Review Recommendations [7.260832843615661]
We present CORAL, a novel approach to reviewer recommendation. We use a socio-technical graph built from the rich set of entities. We show that CORAL is able to model the manual history of reviewer selection remarkably well.
arXiv Detail & Related papers (2022-02-04T20:58:54Z)
Ranking Scientific Papers Using Preference Learning [48.78161994501516]
We cast it as a paper ranking problem based on peer review texts and reviewer scores. We introduce a novel, multi-faceted generic evaluation framework for making final decisions based on peer reviews.
arXiv Detail & Related papers (2021-09-02T19:41:47Z)
ProtoTransformer: A Meta-Learning Approach to Providing Student Feedback [54.142719510638614]
In this paper, we frame the problem of providing feedback as few-shot classification. A meta-learner adapts to give feedback to student code on a new programming question from just a few examples by instructors. Our approach was successfully deployed to deliver feedback to 16,000 student exam-solutions in a programming course offered by a tier 1 university.
arXiv Detail & Related papers (2021-07-23T22:41:28Z)
Manual Evaluation Matters: Reviewing Test Protocols of Distantly Supervised Relation Extraction [61.48964753725744]
We build manually-annotated test sets for two DS-RE datasets, NYT10 and Wiki20, and thoroughly evaluate several competitive models. Results show that the manual evaluation can indicate very different conclusions from automatic ones.
arXiv Detail & Related papers (2021-05-20T06:55:40Z)
Can We Automate Scientific Reviewing? [89.50052670307434]
We discuss the possibility of using state-of-the-art natural language processing (NLP) models to generate first-pass peer reviews for scientific papers. We collect a dataset of papers in the machine learning domain, annotate them with different aspects of content covered in each review, and train targeted summarization models that take in papers to generate reviews. Comprehensive experimental results show that system-generated reviews tend to touch upon more aspects of the paper than human-written reviews.
arXiv Detail & Related papers (2021-01-30T07:16:53Z)
Prior and Prejudice: The Novice Reviewers' Bias against Resubmissions in Conference Peer Review [35.24369486197371]
Modern machine learning and computer science conferences are experiencing a surge in the number of submissions that challenges the quality of peer review. Several conferences have started encouraging or even requiring authors to declare the previous submission history of their papers. We investigate whether reviewers exhibit a bias caused by the knowledge that the submission under review was previously rejected at a similar venue.
arXiv Detail & Related papers (2020-11-30T09:35:37Z)
ReviewRobot: Explainable Paper Review Generation based on Knowledge Synthesis [62.76038841302741]
We build a novel ReviewRobot to automatically assign a review score and write comments for multiple categories such as novelty and meaningful comparison. Experimental results show that our review score predictor reaches 71.4%-100% accuracy. Human assessment by domain experts shows that 41.7%-70.5% of the comments generated by ReviewRobot are valid and constructive, and better than human-written ones for 20% of the time.
arXiv Detail & Related papers (2020-10-13T02:17:58Z)
How Useful are Reviews for Recommendation? A Critical Review and Potential Improvements [8.471274313213092]
We investigate a growing body of work that seeks to improve recommender systems through the use of review text. Our initial findings reveal several discrepancies in reported results, partly due to copying results across papers despite changes in experimental settings or data pre-processing. Further investigation calls for discussion on a much larger problem about the "importance" of user reviews for recommendation.
arXiv Detail & Related papers (2020-05-25T16:30:05Z)
Noisy Adaptive Group Testing using Bayesian Sequential Experimental Design [63.48989885374238]
When the infection prevalence of a disease is low, Dorfman showed 80 years ago that testing groups of people can prove more efficient than testing people individually. Our goal in this paper is to propose new group testing algorithms that can operate in a noisy setting.
arXiv Detail & Related papers (2020-04-26T23:41:33Z)
Code Review in the Classroom [57.300604527924015]
Young developers in a classroom setting provide a clear picture of the potential favourable and problematic areas of the code review process. Their feedback suggests that the process has been well received with some points to better the process. This paper can be used as guidelines to perform code reviews in the classroom.
arXiv Detail & Related papers (2020-04-19T06:07:45Z)
Automating App Review Response Generation [67.58267006314415]
We propose a novel approach RRGen that automatically generates review responses by learning knowledge relations between reviews and their responses. Experiments on 58 apps and 309,246 review-response pairs highlight that RRGen outperforms the baselines by at least 67.4% in terms of BLEU-4.
arXiv Detail & Related papers (2020-02-10T05:23:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.