Related papers: Improving Automated Code Reviews: Learning from Experience

Improving Automated Code Reviews: Learning from Experience

URL: http://arxiv.org/abs/2402.03777v1
Date: Tue, 6 Feb 2024 07:48:22 GMT
Title: Improving Automated Code Reviews: Learning from Experience
Authors: Hong Yi Lin, Patanamon Thongtanunam, Christoph Treude, Wachiraphan Charoenwet
Abstract summary: This study investigates whether higher-quality reviews can be generated from automated code review models. We find that experience-aware oversampling can increase the correctness, level of information, and meaningfulness of reviews.
Score: 12.573740138977065
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modern code review is a critical quality assurance process that is widely adopted in both industry and open source software environments. This process can help newcomers learn from the feedback of experienced reviewers; however, it often brings a large workload and stress to reviewers. To alleviate this burden, the field of automated code reviews aims to automate the process, teaching large language models to provide reviews on submitted code, just as a human would. A recent approach pre-trained and fine-tuned the code intelligent language model on a large-scale code review corpus. However, such techniques did not fully utilise quality reviews amongst the training data. Indeed, reviewers with a higher level of experience or familiarity with the code will likely provide deeper insights than the others. In this study, we set out to investigate whether higher-quality reviews can be generated from automated code review models that are trained based on an experience-aware oversampling technique. Through our quantitative and qualitative evaluation, we find that experience-aware oversampling can increase the correctness, level of information, and meaningfulness of reviews generated by the current state-of-the-art model without introducing new data. The results suggest that a vast amount of high-quality reviews are underutilised with current training strategies. This work sheds light on resource-efficient ways to boost automated code review models.

Related papers

LazyReview A Dataset for Uncovering Lazy Thinking in NLP Peer Reviews [74.87393214734114]
This work introduces LazyReview, a dataset of peer-review sentences annotated with fine-grained lazy thinking categories. Large Language Models (LLMs) struggle to detect these instances in a zero-shot setting. instruction-based fine-tuning on our dataset significantly boosts performance by 10-20 performance points.
arXiv Detail & Related papers (2025-04-15T10:07:33Z)
ReLearn: Unlearning via Learning for Large Language Models [64.2802606302194]
We propose ReLearn, a data augmentation and fine-tuning pipeline for effective unlearning. This framework introduces Knowledge Forgetting Rate (KFR) and Knowledge Retention Rate (KRR) to measure knowledge-level preservation. Our experiments show that ReLearn successfully achieves targeted forgetting while preserving high-quality output.
arXiv Detail & Related papers (2025-02-16T16:31:00Z)
Deep Learning-based Code Reviews: A Paradigm Shift or a Double-Edged Sword? [14.970843824847956]
We run a controlled experiment with 29 experts who reviewed different programs with/without the support of an automatically generated code review. We show that reviewers consider valid most of the issues automatically identified by the LLM and that the availability of an automated review as a starting point strongly influences their behavior. The reviewers who started from an automated review identified a higher number of low-severity issues while, however, not identifying more high-severity issues as compared to a completely manual process.
arXiv Detail & Related papers (2024-11-18T09:24:01Z)
Understanding Code Understandability Improvements in Code Reviews [79.16476505761582]
We analyzed 2,401 code review comments from Java open-source projects on GitHub. 83.9% of suggestions for improvement were accepted and integrated, with fewer than 1% later reverted.
arXiv Detail & Related papers (2024-10-29T12:21:23Z)
Leveraging Reviewer Experience in Code Review Comment Generation [11.224317228559038]
We train deep learning models to imitate human reviewers in providing natural language code reviews. The quality of the model generated reviews remain sub-optimal due to the quality of the open-source code review data used in model training. We propose a suite of experience-aware training methods that utilise the reviewers' past authoring and reviewing experiences as signals for review quality.
arXiv Detail & Related papers (2024-09-17T07:52:50Z)
AI-Assisted Assessment of Coding Practices in Modern Code Review [11.803776132972029]
AutoCommenter is an end-to-end system for learning and enforcing coding best practices. This paper reports on the development, deployment, and evaluation of AutoCommenter.
arXiv Detail & Related papers (2024-05-22T11:57:18Z)
ReviewRanker: A Semi-Supervised Learning Based Approach for Code Review Quality Estimation [0.6895577977557867]
Inspection of review process effectiveness and continuous improvement can boost development productivity. We propose a semi-supervised learning based system ReviewRanker which is aimed at assigning each code review a confidence score. Our proposed method is trained based on simple and and well defined labels provided by developers.
arXiv Detail & Related papers (2023-07-08T15:37:48Z)
Tram: A Token-level Retrieval-augmented Mechanism for Source Code Summarization [76.57699934689468]
We propose a fine-grained Token-level retrieval-augmented mechanism (Tram) on the decoder side to enhance the performance of neural models. To overcome the challenge of token-level retrieval in capturing contextual code semantics, we also propose integrating code semantics into individual summary tokens.
arXiv Detail & Related papers (2023-05-18T16:02:04Z)
CodeExp: Explanatory Code Document Generation [94.43677536210465]
Existing code-to-text generation models produce only high-level summaries of code. We conduct a human study to identify the criteria for high-quality explanatory docstring for code. We present a multi-stage fine-tuning strategy and baseline models for the task.
arXiv Detail & Related papers (2022-11-25T18:05:44Z)
CodeReviewer: Pre-Training for Automating Code Review Activities [36.40557768557425]
This research focuses on utilizing pre-training techniques for the tasks in the code review scenario. We collect a large-scale dataset of real world code changes and code reviews from open-source projects in nine of the most popular programming languages. To better understand code diffs and reviews, we propose CodeReviewer, a pre-trained model that utilizes four pre-training tasks tailored specifically for the code review senario.
arXiv Detail & Related papers (2022-03-17T05:40:13Z)
ProtoTransformer: A Meta-Learning Approach to Providing Student Feedback [54.142719510638614]
In this paper, we frame the problem of providing feedback as few-shot classification. A meta-learner adapts to give feedback to student code on a new programming question from just a few examples by instructors. Our approach was successfully deployed to deliver feedback to 16,000 student exam-solutions in a programming course offered by a tier 1 university.
arXiv Detail & Related papers (2021-07-23T22:41:28Z)
Hierarchical Bi-Directional Self-Attention Networks for Paper Review Rating Recommendation [81.55533657694016]
We propose a Hierarchical bi-directional self-attention Network framework (HabNet) for paper review rating prediction and recommendation. Specifically, we leverage the hierarchical structure of the paper reviews with three levels of encoders: sentence encoder (level one), intra-review encoder (level two) and inter-review encoder (level three) We are able to identify useful predictors to make the final acceptance decision, as well as to help discover the inconsistency between numerical review ratings and text sentiment conveyed by reviewers.
arXiv Detail & Related papers (2020-11-02T08:07:50Z)
Unsupervised Opinion Summarization with Noising and Denoising [85.49169453434554]
We create a synthetic dataset from a corpus of user reviews by sampling a review, pretending it is a summary, and generating noisy versions thereof. At test time, the model accepts genuine reviews and generates a summary containing salient opinions, treating those that do not reach consensus as noise.
arXiv Detail & Related papers (2020-04-21T16:54:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.