Related papers: Code Review Automation using Retrieval Augmented Generation

Code Review Automation using Retrieval Augmented Generation

URL: http://arxiv.org/abs/2511.05302v1
Date: Fri, 07 Nov 2025 15:02:42 GMT
Title: Code Review Automation using Retrieval Augmented Generation
Authors: Qianru Meng, Xiao Zhang, Zhaochen Ren, Joost Visser,
Abstract summary: Code review is essential for maintaining software quality but is labor-intensive.<n> deep learning-based generative techniques and retrieval-based methods have demonstrated strong performance in this task.<n>We introduce Retrieval-Augmented Reviewer (RARe), which combines retrieval-based and generative methods.
Score: 3.438467395627969
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Code review is essential for maintaining software quality but is labor-intensive. Automated code review generation offers a promising solution to this challenge. Both deep learning-based generative techniques and retrieval-based methods have demonstrated strong performance in this task. However, despite these advancements, there are still some limitations where generated reviews can be either off-point or overly general. To address these issues, we introduce Retrieval-Augmented Reviewer (RARe), which leverages Retrieval-Augmented Generation (RAG) to combine retrieval-based and generative methods, explicitly incorporating external domain knowledge into the code review process. RARe uses a dense retriever to select the most relevant reviews from the codebase, which then enrich the input for a neural generator, utilizing the contextual learning capacity of large language models (LLMs), to produce the final review. RARe outperforms state-of-the-art methods on two benchmark datasets, achieving BLEU-4 scores of 12.32 and 12.96, respectively. Its effectiveness is further validated through a detailed human evaluation and a case study using an interpretability tool, demonstrating its practical utility and reliability.

Related papers

AlignCoder: Aligning Retrieval with Target Intent for Repository-Level Code Completion [55.21541958868449]
We propose AlignCoder, a repository-level code completion framework.<n>Our framework generates an enhanced query that bridges the semantic gap between the initial query and the target code.<n>We employ reinforcement learning to train an AlignRetriever that learns to leverage inference information in the enhanced query for more accurate retrieval.
arXiv Detail & Related papers (2026-01-27T15:23:14Z)
High-quality data augmentation for code comment classification [0.48429188360918735]
Since comments are in natural language, they present challenges for machine-based code understanding.<n>Existing datasets for this task suffer from size limitations and class imbalance.<n>We introduce new synthetic oversampling and augmentation techniques based on high-quality data generation to enhance the NLBSE'26 challenge datasets.
arXiv Detail & Related papers (2026-01-27T09:14:56Z)
RLAC: Reinforcement Learning with Adversarial Critic for Free-Form Generation Tasks [75.52891348667491]
Open-ended generation tasks require outputs to satisfy diverse and often implicit task-specific evaluation rubrics.<n>The sheer number of relevant rubrics leads to prohibitively high verification costs and incomplete assessments of a response.<n>We propose Reinforcement Learning with Adrial Critic (RLAC), a post-training approach that addresses these challenges via dynamic rubric verification.
arXiv Detail & Related papers (2025-11-03T17:15:05Z)
Taming Imperfect Process Verifiers: A Sampling Perspective on Backtracking [54.43083499412643]
Test-time algorithms that combine the generative power of language models with process verifiers offer a promising lever for eliciting new reasoning capabilities.<n>We introduce a new process-guided test-time sampling algorithm, VGB, which uses theoretically grounded backtracking to achieve provably better robustness to verifier errors.
arXiv Detail & Related papers (2025-10-03T16:21:14Z)
CoCoNUTS: Concentrating on Content while Neglecting Uninformative Textual Styles for AI-Generated Peer Review Detection [60.52240468810558]
We introduce CoCoNUTS, a content-oriented benchmark built upon a fine-grained dataset of AI-generated peer reviews.<n>We also develop CoCoDet, an AI review detector via a multi-task learning framework, to achieve more accurate and robust detection of AI involvement in review content.
arXiv Detail & Related papers (2025-08-28T06:03:11Z)
Retrieval-Augmented Code Review Comment Generation [0.0]
Automated code review comment generation (RCG) aims to assist developers by automatically producing natural language feedback for code changes.<n>Existing approaches are primarily either generation-based, using pretrained language models, or information retrieval-based (IR), reusing comments from similar past examples.<n>This work proposes to leverage retrieval-augmented generation (RAG) for RCG by conditioning pretrained language models on retrieved code-review exemplars.
arXiv Detail & Related papers (2025-06-13T08:58:20Z)
Leveraging Reward Models for Guiding Code Review Comment Generation [13.306560805316103]
Code review is a crucial component of modern software development, involving the evaluation of code quality, providing feedback on potential issues, and refining the code to address identified problems.<n>Deep learning techniques are able to tackle the generative aspect of code review, by commenting on a given code as a human reviewer would do.<n>In this paper, we introduce CoRAL, a deep learning framework automating review comment generation by exploiting reinforcement learning with a reward mechanism.
arXiv Detail & Related papers (2025-06-04T21:31:38Z)
ReLearn: Unlearning via Learning for Large Language Models [64.2802606302194]
We propose ReLearn, a data augmentation and fine-tuning pipeline for effective unlearning.<n>This framework introduces Knowledge Forgetting Rate (KFR) and Knowledge Retention Rate (KRR) to measure knowledge-level preservation.<n>Our experiments show that ReLearn successfully achieves targeted forgetting while preserving high-quality output.
arXiv Detail & Related papers (2025-02-16T16:31:00Z)
Prompting and Fine-tuning Large Language Models for Automated Code Review Comment Generation [5.6001617185032595]
Large language models pretrained on both programming and natural language data tend to perform well in code-oriented tasks. We fine-tune open-source Large language models (LLM) in parameter-efficient, quantized low-rank fashion on consumer-grade hardware to improve review comment generation.
arXiv Detail & Related papers (2024-11-15T12:01:38Z)
Assessing the Answerability of Queries in Retrieval-Augmented Code Generation [7.68409881755304]
This study proposes a task for evaluating answerability, which assesses whether valid answers can be generated. We build a benchmark dataset called Retrieval-augmented Code Generability Evaluation (RaCGEval) to evaluate the performance of models performing this task.
arXiv Detail & Related papers (2024-11-08T13:09:14Z)
CodeRAG-Bench: Can Retrieval Augment Code Generation? [78.37076502395699]
We conduct a systematic, large-scale analysis of code generation using retrieval-augmented generation.<n>We first curate a comprehensive evaluation benchmark, CodeRAG-Bench, encompassing three categories of code generation tasks.<n>We examine top-performing models on CodeRAG-Bench by providing contexts retrieved from one or multiple sources.
arXiv Detail & Related papers (2024-06-20T16:59:52Z)
Repoformer: Selective Retrieval for Repository-Level Code Completion [30.706277772743615]
Recent advances in retrieval-augmented generation (RAG) have initiated a new era in repository-level code completion. In this paper, we propose a selective RAG framework to avoid retrieval when unnecessary. We show that our framework is able to accommodate different generation models, retrievers, and programming languages.
arXiv Detail & Related papers (2024-03-15T06:59:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.