Related papers: Deep Learning-based Code Reviews: A Paradigm Shift or a Double-Edged Sword?

Deep Learning-based Code Reviews: A Paradigm Shift or a Double-Edged Sword?

URL: http://arxiv.org/abs/2411.11401v2
Date: Wed, 20 Nov 2024 09:44:18 GMT
Title: Deep Learning-based Code Reviews: A Paradigm Shift or a Double-Edged Sword?
Authors: Rosalia Tufano, Alberto Martin-Lopez, Ahmad Tayeb, Ozren Dabić, Sonia Haiduc, Gabriele Bavota,
Abstract summary: We run a controlled experiment with 29 experts who reviewed different programs with/without the support of an automatically generated code review. We show that reviewers consider valid most of the issues automatically identified by the LLM and that the availability of an automated review as a starting point strongly influences their behavior. The reviewers who started from an automated review identified a higher number of low-severity issues while, however, not identifying more high-severity issues as compared to a completely manual process.
Score: 14.970843824847956
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Several techniques have been proposed to automate code review. Early support consisted in recommending the most suited reviewer for a given change or in prioritizing the review tasks. With the advent of deep learning in software engineering, the level of automation has been pushed to new heights, with approaches able to provide feedback on source code in natural language as a human reviewer would do. Also, recent work documented open source projects adopting Large Language Models (LLMs) as co-reviewers. Although the research in this field is very active, little is known about the actual impact of including automatically generated code reviews in the code review process. While there are many aspects worth investigating, in this work we focus on three of them: (i) review quality, i.e., the reviewer's ability to identify issues in the code; (ii) review cost, i.e., the time spent reviewing the code; and (iii) reviewer's confidence, i.e., how confident is the reviewer about the provided feedback. We run a controlled experiment with 29 experts who reviewed different programs with/without the support of an automatically generated code review. During the experiment we monitored the reviewers' activities, for over 50 hours of recorded code reviews. We show that reviewers consider valid most of the issues automatically identified by the LLM and that the availability of an automated review as a starting point strongly influences their behavior: Reviewers tend to focus on the code locations indicated by the LLM rather than searching for additional issues in other parts of the code. The reviewers who started from an automated review identified a higher number of low-severity issues while, however, not identifying more high-severity issues as compared to a completely manual process. Finally, the automated support did not result in saved time and did not increase the reviewers' confidence.

Related papers

Automated Code Review Using Large Language Models at Ericsson: An Experience Report [3.82053496282075]
We describe our experience in using Large Language Models towards automating the code review process in Ericsson.<n>We then describe our preliminary experiments with experienced developers in evaluating our code review tool and the encouraging results.
arXiv Detail & Related papers (2025-07-25T09:50:48Z)
LazyReview A Dataset for Uncovering Lazy Thinking in NLP Peer Reviews [74.87393214734114]
This work introduces LazyReview, a dataset of peer-review sentences annotated with fine-grained lazy thinking categories. Large Language Models (LLMs) struggle to detect these instances in a zero-shot setting. instruction-based fine-tuning on our dataset significantly boosts performance by 10-20 performance points.
arXiv Detail & Related papers (2025-04-15T10:07:33Z)
Can LLM feedback enhance review quality? A randomized study of 20K reviews at ICLR 2025 [115.86204862475864]
Review Feedback Agent provides automated feedback on vague comments, content misunderstandings, and unprofessional remarks to reviewers. It was implemented at ICLR 2025 as a large randomized control study. 27% of reviewers who received feedback updated their reviews, and over 12,000 feedback suggestions from the agent were incorporated by those reviewers.
arXiv Detail & Related papers (2025-04-13T22:01:25Z)
CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models [97.18215355266143]
We introduce a holistic code critique benchmark for Large Language Models (LLMs) called CodeCriticBench. Specifically, our CodeCriticBench includes two mainstream code tasks (i.e., code generation and code QA) with different difficulties. Besides, the evaluation protocols include basic critique evaluation and advanced critique evaluation for different characteristics.
arXiv Detail & Related papers (2025-02-23T15:36:43Z)
Automated Code Review In Practice [1.6271516689052665]
Several AI-assisted tools, such as Qodo, GitHub Copilot, and Coderabbit, provide automated reviews using large language models (LLMs) This study examines the impact of LLM-based automated code review tools in an industrial setting.
arXiv Detail & Related papers (2024-12-24T16:24:45Z)
Understanding Code Understandability Improvements in Code Reviews [79.16476505761582]
We analyzed 2,401 code review comments from Java open-source projects on GitHub. 83.9% of suggestions for improvement were accepted and integrated, with fewer than 1% later reverted.
arXiv Detail & Related papers (2024-10-29T12:21:23Z)
Predicting Expert Evaluations in Software Code Reviews [8.012861163935904]
This paper presents an algorithmic model that automates aspects of code review typically avoided due to their complexity or subjectivity. Instead of replacing manual reviews, our model adds insights that help reviewers focus on more impactful tasks.
arXiv Detail & Related papers (2024-09-23T16:01:52Z)
Leveraging Reviewer Experience in Code Review Comment Generation [11.224317228559038]
We train deep learning models to imitate human reviewers in providing natural language code reviews. The quality of the model generated reviews remain sub-optimal due to the quality of the open-source code review data used in model training. We propose a suite of experience-aware training methods that utilise the reviewers' past authoring and reviewing experiences as signals for review quality.
arXiv Detail & Related papers (2024-09-17T07:52:50Z)
Improving Automated Code Reviews: Learning from Experience [12.573740138977065]
This study investigates whether higher-quality reviews can be generated from automated code review models. We find that experience-aware oversampling can increase the correctness, level of information, and meaningfulness of reviews.
arXiv Detail & Related papers (2024-02-06T07:48:22Z)
Code Review Automation: Strengths and Weaknesses of the State of the Art [14.313783664862923]
Three code review automation techniques tend to succeed or fail in two tasks described in this paper. The study has a strong qualitative focus, with 105 man-hours of manual inspection invested in analyzing correct and wrong predictions.
arXiv Detail & Related papers (2024-01-10T13:00:18Z)
Improving Code Reviewer Recommendation: Accuracy, Latency, Workload, and Bystanders [6.538051328482194]
We build upon the recommender that has been in production since 2018 RevRecV1. We find that reviewers were being assigned based on prior authorship of files. Having an individual who is responsible for the review, reduces the time take for reviews by -11%.
arXiv Detail & Related papers (2023-12-28T17:55:13Z)
ProtoTransformer: A Meta-Learning Approach to Providing Student Feedback [54.142719510638614]
In this paper, we frame the problem of providing feedback as few-shot classification. A meta-learner adapts to give feedback to student code on a new programming question from just a few examples by instructors. Our approach was successfully deployed to deliver feedback to 16,000 student exam-solutions in a programming course offered by a tier 1 university.
arXiv Detail & Related papers (2021-07-23T22:41:28Z)
Can We Automate Scientific Reviewing? [89.50052670307434]
We discuss the possibility of using state-of-the-art natural language processing (NLP) models to generate first-pass peer reviews for scientific papers. We collect a dataset of papers in the machine learning domain, annotate them with different aspects of content covered in each review, and train targeted summarization models that take in papers to generate reviews. Comprehensive experimental results show that system-generated reviews tend to touch upon more aspects of the paper than human-written reviews.
arXiv Detail & Related papers (2021-01-30T07:16:53Z)
Deep Just-In-Time Inconsistency Detection Between Comments and Source Code [51.00904399653609]
In this paper, we aim to detect whether a comment becomes inconsistent as a result of changes to the corresponding body of code. We develop a deep-learning approach that learns to correlate a comment with code changes. We show the usefulness of our approach by combining it with a comment update model to build a more comprehensive automatic comment maintenance system.
arXiv Detail & Related papers (2020-10-04T16:49:28Z)
Code Review in the Classroom [57.300604527924015]
Young developers in a classroom setting provide a clear picture of the potential favourable and problematic areas of the code review process. Their feedback suggests that the process has been well received with some points to better the process. This paper can be used as guidelines to perform code reviews in the classroom.
arXiv Detail & Related papers (2020-04-19T06:07:45Z)
Automating App Review Response Generation [67.58267006314415]
We propose a novel approach RRGen that automatically generates review responses by learning knowledge relations between reviews and their responses. Experiments on 58 apps and 309,246 review-response pairs highlight that RRGen outperforms the baselines by at least 67.4% in terms of BLEU-4.
arXiv Detail & Related papers (2020-02-10T05:23:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.