What Types of Code Review Comments Do Developers Most Frequently Resolve?
- URL: http://arxiv.org/abs/2510.05450v1
- Date: Mon, 06 Oct 2025 23:32:26 GMT
- Title: What Types of Code Review Comments Do Developers Most Frequently Resolve?
- Authors: Saul Goldman, Hong Yi Lin, Jirat Pasuksmit, Patanamon Thongtanunam, Kla Tantithamthavorn, Zhe Wang, Ray Zhang, Ali Behnaz, Fan Jiang, Michael Siers, Ryan Jiang, Mike Buller, Minwoo Jeong, Ming Wu,
- Abstract summary: Large language model (LLM)-powered code review automation tools have been introduced to generate code review comments.<n>This paper investigates the types of review comments written by humans and LLMs, and the types of generated comments that are most frequently resolved by developers.
- Score: 10.277847378685161
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large language model (LLM)-powered code review automation tools have been introduced to generate code review comments. However, not all generated comments will drive code changes. Understanding what types of generated review comments are likely to trigger code changes is crucial for identifying those that are actionable. In this paper, we set out to investigate (1) the types of review comments written by humans and LLMs, and (2) the types of generated comments that are most frequently resolved by developers. To do so, we developed an LLM-as-a-Judge to automatically classify review comments based on our own taxonomy of five categories. Our empirical study confirms that (1) the LLM reviewer and human reviewers exhibit distinct strengths and weaknesses depending on the project context, and (2) readability, bugs, and maintainability-related comments had higher resolution rates than those focused on code design. These results suggest that a substantial proportion of LLM-generated comments are actionable and can be resolved by developers. Our work highlights the complementarity between LLM and human reviewers and offers suggestions to improve the practical effectiveness of LLM-powered code review tools.
Related papers
- Reviewing the Reviewer: Elevating Peer Review Quality through LLM-Guided Feedback [75.31379834079648]
We introduce an LLM-driven framework that decomposes reviews into argumentative segments.<n>We also release LazyReviewPlus, a dataset of 1,309 sentences labeled for lazy thinking and specificity.
arXiv Detail & Related papers (2026-01-17T20:32:18Z) - RovoDev Code Reviewer: A Large-Scale Online Evaluation of LLM-based Code Review Automation at Atlassian [6.019311657369222]
Large Language Models (LLMs)-powered code review automation has the potential to transform seamless code review.<n>This paper aims at answering the practical question: how can we design a review-guided, context-aware, quality-checked code review comment generation without fine-tuning?<n>We present RovoDev Code Reviewer, an enterprise-grade LLM-based code review automation tool designed and deployed at scale within Atlassian's development ecosystem with integration into Atlassian's Bitbucket.
arXiv Detail & Related papers (2026-01-03T09:27:56Z) - LAURA: Enhancing Code Review Generation with Context-Enriched Retrieval-Augmented LLM [17.54065758880181]
This paper proposes an LLM-based review knowledge-augmented, context-aware framework for code review generation, named LAURA.<n>The framework integrates review retrieval, context augmentation, and systematic guidance to enhance the performance of ChatGPT-4o and DeepSeek v3 in generating code review comments.
arXiv Detail & Related papers (2025-12-01T07:10:23Z) - Does AI Code Review Lead to Code Changes? A Case Study of GitHub Actions [21.347559936084807]
AI-based code review tools automatically review and comment on pull requests to improve code quality.<n>We present a large-scale empirical study of 16 popular AI-based code review actions for GitHub.<n>We investigate how these tools are adopted and configured, whether their comments lead to code changes, and which factors influence their effectiveness.
arXiv Detail & Related papers (2025-08-26T07:55:23Z) - Exploring the Potential of Large Language Models in Fine-Grained Review Comment Classification [4.61232919707345]
Large Language Models (LLMs) can classify 17 categories of code review comments.<n>LLMs achieve better accuracy in classifying the five most useful categories.<n>These results suggest that the LLMs could offer a scalable solution for code review analytics.
arXiv Detail & Related papers (2025-08-13T14:07:05Z) - IFEvalCode: Controlled Code Generation [69.28317223249358]
The paper introduces forward and backward constraints generation to improve the instruction-following capabilities of Code LLMs.<n>The authors present IFEvalCode, a multilingual benchmark comprising 1.6K test samples across seven programming languages.
arXiv Detail & Related papers (2025-07-30T08:08:48Z) - Leveraging Reward Models for Guiding Code Review Comment Generation [13.306560805316103]
Code review is a crucial component of modern software development, involving the evaluation of code quality, providing feedback on potential issues, and refining the code to address identified problems.<n>Deep learning techniques are able to tackle the generative aspect of code review, by commenting on a given code as a human reviewer would do.<n>In this paper, we introduce CoRAL, a deep learning framework automating review comment generation by exploiting reinforcement learning with a reward mechanism.
arXiv Detail & Related papers (2025-06-04T21:31:38Z) - CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models [97.18215355266143]
We introduce a holistic code critique benchmark for Large Language Models (LLMs) called CodeCriticBench.<n>Specifically, our CodeCriticBench includes two mainstream code tasks (i.e., code generation and code QA) with different difficulties.<n>Besides, the evaluation protocols include basic critique evaluation and advanced critique evaluation for different characteristics.
arXiv Detail & Related papers (2025-02-23T15:36:43Z) - Comments as Natural Logic Pivots: Improve Code Generation via Comment Perspective [85.48043537327258]
We propose MANGO (comMents As Natural loGic pivOts), including a comment contrastive training strategy and a corresponding logical comment decoding strategy.
Results indicate that MANGO significantly improves the code pass rate based on the strong baselines.
The robustness of the logical comment decoding strategy is notably higher than the Chain-of-thoughts prompting.
arXiv Detail & Related papers (2024-04-11T08:30:46Z) - Automating Patch Set Generation from Code Review Comments Using Large Language Models [2.045040820541428]
We provide code contexts to five popular Large Language Models (LLMs)
We obtain the suggested code-changes (patch sets) derived from real-world code-review comments.
The performance of each model is meticulously assessed by comparing their generated patch sets against the historical data of human-generated patch-sets.
arXiv Detail & Related papers (2024-04-10T02:46:08Z) - Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs [65.2379940117181]
We introduce code prompting, a chain of prompts that transforms a natural language problem into code.
We find that code prompting exhibits a high-performance boost for multiple LLMs.
Our analysis of GPT 3.5 reveals that the code formatting of the input problem is essential for performance improvement.
arXiv Detail & Related papers (2024-01-18T15:32:24Z) - Exploring the Advances in Identifying Useful Code Review Comments [0.0]
This paper reflects the evolution of research on the usefulness of code review comments.
It examines papers that define the usefulness of code review comments, mine and annotate datasets, study developers' perceptions, analyze factors from different aspects, and use machine learning classifiers to automatically predict the usefulness of code review comments.
arXiv Detail & Related papers (2023-07-03T00:41:20Z) - Deep Just-In-Time Inconsistency Detection Between Comments and Source
Code [51.00904399653609]
In this paper, we aim to detect whether a comment becomes inconsistent as a result of changes to the corresponding body of code.
We develop a deep-learning approach that learns to correlate a comment with code changes.
We show the usefulness of our approach by combining it with a comment update model to build a more comprehensive automatic comment maintenance system.
arXiv Detail & Related papers (2020-10-04T16:49:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.