Related papers: Does AI Code Review Lead to Code Changes? A Case Study of GitHub Actions

Does AI Code Review Lead to Code Changes? A Case Study of GitHub Actions

URL: http://arxiv.org/abs/2508.18771v1
Date: Tue, 26 Aug 2025 07:55:23 GMT
Title: Does AI Code Review Lead to Code Changes? A Case Study of GitHub Actions
Authors: Kexin Sun, Hongyu Kuang, Sebastian Baltes, Xin Zhou, He Zhang, Xiaoxing Ma, Guoping Rong, Dong Shao, Christoph Treude,
Abstract summary: AI-based code review tools automatically review and comment on pull requests to improve code quality.<n>We present a large-scale empirical study of 16 popular AI-based code review actions for GitHub.<n>We investigate how these tools are adopted and configured, whether their comments lead to code changes, and which factors influence their effectiveness.
Score: 21.347559936084807
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: AI-based code review tools automatically review and comment on pull requests to improve code quality. Despite their growing presence, little is known about their actual impact. We present a large-scale empirical study of 16 popular AI-based code review actions for GitHub workflows, analyzing more than 22,000 review comments in 178 repositories. We investigate (1) how these tools are adopted and configured, (2) whether their comments lead to code changes, and (3) which factors influence their effectiveness. We develop a two-stage LLM-assisted framework to determine whether review comments are addressed, and use interpretable machine learning to identify influencing factors. Our findings show that, while adoption is growing, effectiveness varies widely. Comments that are concise, contain code snippets, and are manually triggered, particularly those from hunk-level review tools, are more likely to result in code changes. These results highlight the importance of careful tool design and suggest directions for improving AI-based code review systems.

Related papers

What Types of Code Review Comments Do Developers Most Frequently Resolve? [10.277847378685161]
Large language model (LLM)-powered code review automation tools have been introduced to generate code review comments.<n>This paper investigates the types of review comments written by humans and LLMs, and the types of generated comments that are most frequently resolved by developers.
arXiv Detail & Related papers (2025-10-06T23:32:26Z)
CoCoNUTS: Concentrating on Content while Neglecting Uninformative Textual Styles for AI-Generated Peer Review Detection [60.52240468810558]
We introduce CoCoNUTS, a content-oriented benchmark built upon a fine-grained dataset of AI-generated peer reviews.<n>We also develop CoCoDet, an AI review detector via a multi-task learning framework, to achieve more accurate and robust detection of AI involvement in review content.
arXiv Detail & Related papers (2025-08-28T06:03:11Z)
Measuring the effectiveness of code review comments in GitHub repositories: A machine learning approach [0.969054772470341]
This paper illustrates an empirical study of the working efficiency of machine learning techniques in classifying code review text by semantic meaning.<n>We manually labelled 13557 code review comments generated by three open source projects in GitHub during existing year.<n>In order to recognize the sentiment polarity (or sentiment orientation) of code reviews, we use seven machine learning algorithms and compare those results to find the better ones.
arXiv Detail & Related papers (2025-08-22T03:00:48Z)
Code Review as Decision-Making -- Building a Cognitive Model from the Questions Asked During Code Review [2.8299846354183953]
We build a cognitive model of code review bottom up through thematic, statistical, temporal, and sequential analysis of the transcribed material.<n>The model shows how developers move through two phases during the code review; first an orientation phase to establish context and rationale, then an analytical phase to understand, assess, and plan the rest of the review.
arXiv Detail & Related papers (2025-07-13T14:04:16Z)
LazyReview A Dataset for Uncovering Lazy Thinking in NLP Peer Reviews [74.87393214734114]
This work introduces LazyReview, a dataset of peer-review sentences annotated with fine-grained lazy thinking categories.<n>Large Language Models (LLMs) struggle to detect these instances in a zero-shot setting.<n> instruction-based fine-tuning on our dataset significantly boosts performance by 10-20 performance points.
arXiv Detail & Related papers (2025-04-15T10:07:33Z)
Understanding Code Understandability Improvements in Code Reviews [79.16476505761582]
We analyzed 2,401 code review comments from Java open-source projects on GitHub. 83.9% of suggestions for improvement were accepted and integrated, with fewer than 1% later reverted.
arXiv Detail & Related papers (2024-10-29T12:21:23Z)
Comments as Natural Logic Pivots: Improve Code Generation via Comment Perspective [85.48043537327258]
We propose MANGO (comMents As Natural loGic pivOts), including a comment contrastive training strategy and a corresponding logical comment decoding strategy. Results indicate that MANGO significantly improves the code pass rate based on the strong baselines. The robustness of the logical comment decoding strategy is notably higher than the Chain-of-thoughts prompting.
arXiv Detail & Related papers (2024-04-11T08:30:46Z)
Code Review Automation: Strengths and Weaknesses of the State of the Art [14.313783664862923]
Three code review automation techniques tend to succeed or fail in two tasks described in this paper. The study has a strong qualitative focus, with 105 man-hours of manual inspection invested in analyzing correct and wrong predictions.
arXiv Detail & Related papers (2024-01-10T13:00:18Z)
AI Explainability 360: Impact and Design [120.95633114160688]
In 2019, we created AI Explainability 360 (Arya et al. 2020), an open source software toolkit featuring ten diverse and state-of-the-art explainability methods. This paper examines the impact of the toolkit with several case studies, statistics, and community feedback. The paper also describes the flexible design of the toolkit, examples of its use, and the significant educational material and documentation available to its users.
arXiv Detail & Related papers (2021-09-24T19:17:09Z)
Deep Just-In-Time Inconsistency Detection Between Comments and Source Code [51.00904399653609]
In this paper, we aim to detect whether a comment becomes inconsistent as a result of changes to the corresponding body of code. We develop a deep-learning approach that learns to correlate a comment with code changes. We show the usefulness of our approach by combining it with a comment update model to build a more comprehensive automatic comment maintenance system.
arXiv Detail & Related papers (2020-10-04T16:49:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.