Towards Automated Classification of Code Review Feedback to Support
Analytics
- URL: http://arxiv.org/abs/2307.03852v1
- Date: Fri, 7 Jul 2023 21:53:20 GMT
- Title: Towards Automated Classification of Code Review Feedback to Support
Analytics
- Authors: Asif Kamal Turzo and Fahim Faysal and Ovi Poddar and Jaydeb Sarker and
Anindya Iqbal and Amiangshu Bosu
- Abstract summary: This study aims to develop an automated code review comment classification system.
We trained and evaluated supervised learning-based DNN models leveraging code context, comment text, and a set of code metrics.
Our approach outperforms Fregnan et al.'s approach by achieving 18.7% higher accuracy.
- Score: 4.423428708304586
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Background: As improving code review (CR) effectiveness is a priority for
many software development organizations, projects have deployed CR analytics
platforms to identify potential improvement areas. The number of issues
identified, which is a crucial metric to measure CR effectiveness, can be
misleading if all issues are placed in the same bin. Therefore, a finer-grained
classification of issues identified during CRs can provide actionable insights
to improve CR effectiveness. Although a recent work by Fregnan et al. proposed
automated models to classify CR-induced changes, we have noticed two potential
improvement areas -- i) classifying comments that do not induce changes and ii)
using deep neural networks (DNN) in conjunction with code context to improve
performances. Aims: This study aims to develop an automated CR comment
classifier that leverages DNN models to achieve a more reliable performance
than Fregnan et al. Method: Using a manually labeled dataset of 1,828 CR
comments, we trained and evaluated supervised learning-based DNN models
leveraging code context, comment text, and a set of code metrics to classify CR
comments into one of the five high-level categories proposed by Turzo and Bosu.
Results: Based on our 10-fold cross-validation-based evaluations of multiple
combinations of tokenization approaches, we found a model using CodeBERT
achieving the best accuracy of 59.3%. Our approach outperforms Fregnan et al.'s
approach by achieving 18.7% higher accuracy. Conclusion: Besides facilitating
improved CR analytics, our proposed model can be useful for developers in
prioritizing code review feedback and selecting reviewers.
Related papers
- Can We Further Elicit Reasoning in LLMs? Critic-Guided Planning with Retrieval-Augmentation for Solving Challenging Tasks [68.49251303172674]
State-of-the-art large language models (LLMs) exhibit impressive problem-solving capabilities but may struggle with complex reasoning and factual correctness.
Existing methods harness the strengths of chain-of-thought and retrieval-augmented generation (RAG) to decompose a complex problem into simpler steps and apply retrieval to improve factual correctness.
We introduce Critic-guided planning with Retrieval-augmentation, CR-Planner, a novel framework that leverages fine-tuned critic models to guide both reasoning and retrieval processes through planning.
arXiv Detail & Related papers (2024-10-02T11:26:02Z) - Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models [63.36637269634553]
We present a novel method of further improving performance by requiring models to compare multiple reasoning chains.
We find that instruction tuning on DCoT datasets boosts the performance of even smaller, and therefore more accessible, language models.
arXiv Detail & Related papers (2024-07-03T15:01:18Z) - Improving the Validity and Practical Usefulness of AI/ML Evaluations Using an Estimands Framework [2.4861619769660637]
We propose an estimands framework adapted from international clinical trials guidelines.
This framework provides a systematic structure for inference and reporting in evaluations.
We demonstrate how the framework can help uncover underlying issues, their causes, and potential solutions.
arXiv Detail & Related papers (2024-06-14T18:47:37Z) - Overcoming Pitfalls in Graph Contrastive Learning Evaluation: Toward
Comprehensive Benchmarks [60.82579717007963]
We introduce an enhanced evaluation framework designed to more accurately gauge the effectiveness, consistency, and overall capability of Graph Contrastive Learning (GCL) methods.
arXiv Detail & Related papers (2024-02-24T01:47:56Z) - Reassessing Java Code Readability Models with a Human-Centered Approach [3.798885293742468]
This research assesses existing Java Code Readability (CR) models for Large Language Models (LLMs) adjustments.
We identify 12 key code aspects influencing CR that were assessed by 390 programmers when labeling 120 AI-generated snippets.
Our findings indicate that when AI generates concise and executable code, it is often considered readable by CR models and developers.
arXiv Detail & Related papers (2024-01-26T15:18:22Z) - QualEval: Qualitative Evaluation for Model Improvement [82.73561470966658]
We propose QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement.
QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights.
We demonstrate that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative.
arXiv Detail & Related papers (2023-11-06T00:21:44Z) - Better Handling Coreference Resolution in Aspect Level Sentiment
Classification by Fine-Tuning Language Models [4.2605449879340656]
Monitoring customer feedback can be automated with Aspect Level Sentiment Classification (ALSC)
Large Language Models (LLMs) are the heart of many state-of-the-art ALSC solutions, but they perform poorly in some scenarios requiring Coreference Resolution (CR)
We propose a framework to improve an LLM's performance on CR-containing reviews by fine tuning on highly inferential tasks.
arXiv Detail & Related papers (2023-07-11T12:43:28Z) - What Makes a Code Review Useful to OpenDev Developers? An Empirical
Investigation [4.061135251278187]
Even a minor improvement in the effectiveness of Code Reviews can incur significant savings for a software development organization.
This study aims to develop a finer grain understanding of what makes a code review comment useful to OSS developers.
arXiv Detail & Related papers (2023-02-22T22:48:27Z) - Augmentation-induced Consistency Regularization for Classification [25.388324221293203]
We propose a consistency regularization framework based on data augmentation, called CR-Aug.
CR-Aug forces the output distributions of different sub models generated by data augmentation to be consistent with each other.
We implement CR-Aug to image and audio classification tasks and conduct extensive experiments to verify its effectiveness.
arXiv Detail & Related papers (2022-05-25T03:15:36Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z) - CRACT: Cascaded Regression-Align-Classification for Robust Visual
Tracking [97.84109669027225]
We introduce an improved proposal refinement module, Cascaded Regression-Align- Classification (CRAC)
CRAC yields new state-of-the-art performances on many benchmarks.
In experiments on seven benchmarks including OTB-2015, UAV123, NfS, VOT-2018, TrackingNet, GOT-10k and LaSOT, our CRACT exhibits very promising results in comparison with state-of-the-art competitors.
arXiv Detail & Related papers (2020-11-25T02:18:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.