Related papers: Unity is Strength: Cross-Task Knowledge Distillation to Improve Code Review Generation

Unity is Strength: Cross-Task Knowledge Distillation to Improve Code Review Generation

URL: http://arxiv.org/abs/2309.03362v1
Date: Wed, 6 Sep 2023 21:10:33 GMT
Title: Unity is Strength: Cross-Task Knowledge Distillation to Improve Code Review Generation
Authors: Oussama Ben Sghaier, Lucas Maes, Houari Sahraoui
Abstract summary: We propose a novel deep-learning architecture, DISCOREV, based on cross-task knowledge distillation. In our approach, the fine-tuning of the comment generation model is guided by the code refinement model. Our results show that our approach generates better review comments as measured by the BLEU score.
Score: 0.9208007322096533
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Code review is a fundamental process in software development that plays a critical role in ensuring code quality and reducing the likelihood of errors and bugs. However, code review might be complex, subjective, and time-consuming. Comment generation and code refinement are two key tasks of this process and their automation has traditionally been addressed separately in the literature using different approaches. In this paper, we propose a novel deep-learning architecture, DISCOREV, based on cross-task knowledge distillation that addresses these two tasks simultaneously. In our approach, the fine-tuning of the comment generation model is guided by the code refinement model. We implemented this guidance using two strategies, feedback-based learning objective and embedding alignment objective. We evaluated our approach based on cross-task knowledge distillation by comparing it to the state-of-the-art methods that are based on independent training and fine-tuning. Our results show that our approach generates better review comments as measured by the BLEU score.

Related papers

OpenUnlearning: Accelerating LLM Unlearning via Unified Benchmarking of Methods and Metrics [101.78963920333342]
We introduce OpenUnlearning, a standardized framework for benchmarking large language models (LLMs) unlearning methods and metrics.<n>OpenUnlearning integrates 9 unlearning algorithms and 16 diverse evaluations across 3 leading benchmarks.<n>We also benchmark diverse unlearning methods and provide a comparative analysis against an extensive evaluation suite.
arXiv Detail & Related papers (2025-06-14T20:16:37Z)
Leveraging Reward Models for Guiding Code Review Comment Generation [13.306560805316103]
Code review is a crucial component of modern software development, involving the evaluation of code quality, providing feedback on potential issues, and refining the code to address identified problems.<n>Deep learning techniques are able to tackle the generative aspect of code review, by commenting on a given code as a human reviewer would do.<n>In this paper, we introduce CoRAL, a deep learning framework automating review comment generation by exploiting reinforcement learning with a reward mechanism.
arXiv Detail & Related papers (2025-06-04T21:31:38Z)
Training Language Models to Generate Quality Code with Program Analysis Feedback [66.0854002147103]
Code generation with large language models (LLMs) is increasingly adopted in production but fails to ensure code quality.<n>We propose REAL, a reinforcement learning framework that incentivizes LLMs to generate production-quality code.
arXiv Detail & Related papers (2025-05-28T17:57:47Z)
Code Review Automation Via Multi-task Federated LLM -- An Empirical Study [4.8342038441006805]
The study explores five simple techniques for multi-task training, including two sequential methods, one parallel method, and two cumulative methods. The results indicate that sequentially training a federated LLM (FedLLM) for our code review multi-task use case is less efficient in terms of time, computation, and performance metrics, compared to training separate models for each task.
arXiv Detail & Related papers (2024-12-20T08:46:46Z)
Data Laundering: Artificially Boosting Benchmark Results through Knowledge Distillation [11.215746700797618]
We show that knowledge distillation can be subverted to manipulate language model benchmark scores.<n>We introduce "Data Laundering," a process that enables the covert transfer of benchmark-specific knowledge.<n>We show how this approach can achieve substantial improvements in benchmark accuracy without developing genuine reasoning capabilities.
arXiv Detail & Related papers (2024-12-15T19:38:48Z)
A Progressive Transformer for Unifying Binary Code Embedding and Knowledge Transfer [15.689556592544667]
We introduce ProTST, a novel transformer-based methodology for binary code embedding. ProTST employs a hierarchical training process based on a unique tree-like structure. Results show that ProTST yields an average validation score (F1, MRR, and Recall@1) improvement of 14.8% compared to traditional two-stage training.
arXiv Detail & Related papers (2024-12-15T13:04:29Z)
CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution [74.41064280094064]
textbfJudger-1 is the first open-source textbfall-in-one judge LLM. CompassJudger-1 is a general-purpose LLM that demonstrates remarkable versatility. textbfJudgerBench is a new benchmark that encompasses various subjective evaluation tasks.
arXiv Detail & Related papers (2024-10-21T17:56:51Z)
DOCE: Finding the Sweet Spot for Execution-Based Code Generation [69.5305729627198]
We propose a comprehensive framework that includes candidate generation, $n$-best reranking, minimum Bayes risk (MBR) decoding, and self-ging as the core components. Our findings highlight the importance of execution-based methods and the difference gap between execution-based and execution-free methods.
arXiv Detail & Related papers (2024-08-25T07:10:36Z)
Towards Effective Evaluations and Comparisons for LLM Unlearning Methods [97.2995389188179]
This paper seeks to refine the evaluation of machine unlearning for large language models. It addresses two key challenges -- the robustness of evaluation metrics and the trade-offs between competing goals.
arXiv Detail & Related papers (2024-06-13T14:41:00Z)
AI-powered Code Review with LLMs: Early Results [10.37036924997437]
We present a novel approach to improving software quality and efficiency through a Large Language Model (LLM)-based model. Our proposed LLM-based AI agent model is trained on large code repositories. It aims to detect code smells, identify potential bugs, provide suggestions for improvement, and optimize the code.
arXiv Detail & Related papers (2024-04-29T08:27:50Z)
Improving the Learning of Code Review Successive Tasks with Cross-Task Knowledge Distillation [1.0878040851638]
We introduce a novel deep-learning architecture, named DISCOREV, which employs cross-task knowledge distillation to address these tasks simultaneously. We show that our approach generates better review comments, as measured by the BLEU score, as well as more accurate code refinement according to the CodeBLEU score.
arXiv Detail & Related papers (2024-02-03T07:02:22Z)
ReviewRanker: A Semi-Supervised Learning Based Approach for Code Review Quality Estimation [0.6895577977557867]
Inspection of review process effectiveness and continuous improvement can boost development productivity. We propose a semi-supervised learning based system ReviewRanker which is aimed at assigning each code review a confidence score. Our proposed method is trained based on simple and and well defined labels provided by developers.
arXiv Detail & Related papers (2023-07-08T15:37:48Z)
Deep Learning Based Code Generation Methods: Literature Review [30.17038624027751]
This paper focuses on Code Generation task that aims at generating relevant code fragments according to given natural language descriptions. In this paper, we systematically review the current work on deep learning-based code generation methods.
arXiv Detail & Related papers (2023-03-02T08:25:42Z)
Benchopt: Reproducible, efficient and collaborative optimization benchmarks [67.29240500171532]
Benchopt is a framework to automate, reproduce and publish optimization benchmarks in machine learning. Benchopt simplifies benchmarking for the community by providing an off-the-shelf tool for running, sharing and extending experiments.
arXiv Detail & Related papers (2022-06-27T16:19:24Z)
Visual Transformer for Task-aware Active Learning [49.903358393660724]
We present a novel pipeline for pool-based Active Learning. Our method exploits accessible unlabelled examples during training to estimate their co-relation with the labelled examples. Visual Transformer models non-local visual concept dependency between labelled and unlabelled examples.
arXiv Detail & Related papers (2021-06-07T17:13:59Z)
Lessons from Chasing Few-Shot Learning Benchmarks: Rethinking the Evaluation of Meta-Learning Methods [9.821362920940631]
We introduce a simple baseline for meta-learning, FIX-ML. We explore two possible goals of meta-learning: to develop methods that generalize (i) to the same task distribution that generates the training set (in-distribution), or (ii) to new, unseen task distributions (out-of-distribution) Our results highlight that in order to reason about progress in this space, it is necessary to provide a clearer description of the goals of meta-learning, and to develop more appropriate evaluation strategies.
arXiv Detail & Related papers (2021-02-23T05:34:30Z)
Knowledge-Aware Procedural Text Understanding with Multi-Stage Training [110.93934567725826]
We focus on the task of procedural text understanding, which aims to comprehend such documents and track entities' states and locations during a process. Two challenges, the difficulty of commonsense reasoning and data insufficiency, still remain unsolved. We propose a novel KnOwledge-Aware proceduraL text understAnding (KOALA) model, which effectively leverages multiple forms of external knowledge.
arXiv Detail & Related papers (2020-09-28T10:28:40Z)
Enhancing Dialogue Generation via Multi-Level Contrastive Learning [57.005432249952406]
We propose a multi-level contrastive learning paradigm to model the fine-grained quality of the responses with respect to the query. A Rank-aware (RC) network is designed to construct the multi-level contrastive optimization objectives. We build a Knowledge Inference (KI) component to capture the keyword knowledge from the reference during training and exploit such information to encourage the generation of informative words.
arXiv Detail & Related papers (2020-09-19T02:41:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.