Towards Automated Classification of Code Review Feedback to Support
Analytics
- URL: http://arxiv.org/abs/2307.03852v1
- Date: Fri, 7 Jul 2023 21:53:20 GMT
- Title: Towards Automated Classification of Code Review Feedback to Support
Analytics
- Authors: Asif Kamal Turzo and Fahim Faysal and Ovi Poddar and Jaydeb Sarker and
Anindya Iqbal and Amiangshu Bosu
- Abstract summary: This study aims to develop an automated code review comment classification system.
We trained and evaluated supervised learning-based DNN models leveraging code context, comment text, and a set of code metrics.
Our approach outperforms Fregnan et al.'s approach by achieving 18.7% higher accuracy.
- Score: 4.423428708304586
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Background: As improving code review (CR) effectiveness is a priority for
many software development organizations, projects have deployed CR analytics
platforms to identify potential improvement areas. The number of issues
identified, which is a crucial metric to measure CR effectiveness, can be
misleading if all issues are placed in the same bin. Therefore, a finer-grained
classification of issues identified during CRs can provide actionable insights
to improve CR effectiveness. Although a recent work by Fregnan et al. proposed
automated models to classify CR-induced changes, we have noticed two potential
improvement areas -- i) classifying comments that do not induce changes and ii)
using deep neural networks (DNN) in conjunction with code context to improve
performances. Aims: This study aims to develop an automated CR comment
classifier that leverages DNN models to achieve a more reliable performance
than Fregnan et al. Method: Using a manually labeled dataset of 1,828 CR
comments, we trained and evaluated supervised learning-based DNN models
leveraging code context, comment text, and a set of code metrics to classify CR
comments into one of the five high-level categories proposed by Turzo and Bosu.
Results: Based on our 10-fold cross-validation-based evaluations of multiple
combinations of tokenization approaches, we found a model using CodeBERT
achieving the best accuracy of 59.3%. Our approach outperforms Fregnan et al.'s
approach by achieving 18.7% higher accuracy. Conclusion: Besides facilitating
improved CR analytics, our proposed model can be useful for developers in
prioritizing code review feedback and selecting reviewers.
Related papers
- Harnessing Large Language Models for Curated Code Reviews [2.5944208050492183]
In code review, generating structured and relevant comments is crucial for identifying code issues and facilitating accurate code changes.
Existing code review datasets are often noisy and unrefined, posing limitations to the learning potential of AI models.
We propose a curation pipeline designed to enhance the quality of the largest publicly available code review dataset.
arXiv Detail & Related papers (2025-02-05T18:15:09Z) - RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques [59.861013614500024]
We introduce a new benchmark designed to assess the critique capabilities of Large Language Models (LLMs)
Unlike existing benchmarks, which typically function in an open-loop fashion, our approach employs a closed-loop methodology that evaluates the quality of corrections generated from critiques.
arXiv Detail & Related papers (2025-01-24T13:48:10Z) - Hold On! Is My Feedback Useful? Evaluating the Usefulness of Code Review Comments [0.0]
This paper investigates the usefulness of Code Review Comments (CR comments) through textual feature-based and featureless approaches.
Our models outperform the baseline by achieving state-of-the-art performance.
Our analyses portray the similarities and differences of domains, projects, datasets, models, and features for predicting the usefulness of CR comments.
arXiv Detail & Related papers (2025-01-12T07:22:13Z) - Enabling Scalable Oversight via Self-Evolving Critic [59.861013614500024]
SCRIT (Self-evolving CRITic) is a framework that enables genuine self-evolution of critique abilities.
It self-improves by training on synthetic data, generated by a contrastive-based self-critic.
It achieves up to a 10.3% improvement on critique-correction and error identification benchmarks.
arXiv Detail & Related papers (2025-01-10T05:51:52Z) - Can We Further Elicit Reasoning in LLMs? Critic-Guided Planning with Retrieval-Augmentation for Solving Challenging Tasks [68.49251303172674]
State-of-the-art large language models (LLMs) exhibit impressive problem-solving capabilities but may struggle with complex reasoning and factual correctness.
Existing methods harness the strengths of chain-of-thought and retrieval-augmented generation (RAG) to decompose a complex problem into simpler steps and apply retrieval to improve factual correctness.
We introduce Critic-guided planning with Retrieval-augmentation, CR-Planner, a novel framework that leverages fine-tuned critic models to guide both reasoning and retrieval processes through planning.
arXiv Detail & Related papers (2024-10-02T11:26:02Z) - Improving the Validity and Practical Usefulness of AI/ML Evaluations Using an Estimands Framework [2.4861619769660637]
We propose an estimands framework adapted from international clinical trials guidelines.
This framework provides a systematic structure for inference and reporting in evaluations.
We demonstrate how the framework can help uncover underlying issues, their causes, and potential solutions.
arXiv Detail & Related papers (2024-06-14T18:47:37Z) - Overcoming Pitfalls in Graph Contrastive Learning Evaluation: Toward
Comprehensive Benchmarks [60.82579717007963]
We introduce an enhanced evaluation framework designed to more accurately gauge the effectiveness, consistency, and overall capability of Graph Contrastive Learning (GCL) methods.
arXiv Detail & Related papers (2024-02-24T01:47:56Z) - Reassessing Java Code Readability Models with a Human-Centered Approach [3.798885293742468]
This research assesses existing Java Code Readability (CR) models for Large Language Models (LLMs) adjustments.
We identify 12 key code aspects influencing CR that were assessed by 390 programmers when labeling 120 AI-generated snippets.
Our findings indicate that when AI generates concise and executable code, it is often considered readable by CR models and developers.
arXiv Detail & Related papers (2024-01-26T15:18:22Z) - What Makes a Code Review Useful to OpenDev Developers? An Empirical
Investigation [4.061135251278187]
Even a minor improvement in the effectiveness of Code Reviews can incur significant savings for a software development organization.
This study aims to develop a finer grain understanding of what makes a code review comment useful to OSS developers.
arXiv Detail & Related papers (2023-02-22T22:48:27Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z) - CRACT: Cascaded Regression-Align-Classification for Robust Visual
Tracking [97.84109669027225]
We introduce an improved proposal refinement module, Cascaded Regression-Align- Classification (CRAC)
CRAC yields new state-of-the-art performances on many benchmarks.
In experiments on seven benchmarks including OTB-2015, UAV123, NfS, VOT-2018, TrackingNet, GOT-10k and LaSOT, our CRACT exhibits very promising results in comparison with state-of-the-art competitors.
arXiv Detail & Related papers (2020-11-25T02:18:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.