Related papers: KADEL: Knowledge-Aware Denoising Learning for Commit Message Generation

KADEL: Knowledge-Aware Denoising Learning for Commit Message Generation

URL: http://arxiv.org/abs/2401.08376v1
Date: Tue, 16 Jan 2024 14:07:48 GMT
Title: KADEL: Knowledge-Aware Denoising Learning for Commit Message Generation
Authors: Wei Tao, Yucheng Zhou, Yanlin Wang, Hongyu Zhang, Haofen Wang, Wenqiang Zhang
Abstract summary: We propose a novel knowledge-aware denoising learning method called KADEL. Considering that good-practice commits constitute only a small proportion of the dataset, we align the remaining training samples with these good-practice commits. Our method achieves overall state-of-the-art performance compared with previous methods.
Score: 43.8807366757381
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Commit messages are natural language descriptions of code changes, which are important for software evolution such as code understanding and maintenance. However, previous methods are trained on the entire dataset without considering the fact that a portion of commit messages adhere to good practice (i.e., good-practice commits), while the rest do not. On the basis of our empirical study, we discover that training on good-practice commits significantly contributes to the commit message generation. Motivated by this finding, we propose a novel knowledge-aware denoising learning method called KADEL. Considering that good-practice commits constitute only a small proportion of the dataset, we align the remaining training samples with these good-practice commits. To achieve this, we propose a model that learns the commit knowledge by training on good-practice commits. This knowledge model enables supplementing more information for training samples that do not conform to good practice. However, since the supplementary information may contain noise or prediction errors, we propose a dynamic denoising training method. This method composes a distribution-aware confidence function and a dynamic distribution list, which enhances the effectiveness of the training process. Experimental results on the whole MCMD dataset demonstrate that our method overall achieves state-of-the-art performance compared with previous methods. Our source code and data are available at https://github.com/DeepSoftwareAnalytics/KADEL

Related papers

DESIRE: Dynamic Knowledge Consolidation for Rehearsal-Free Continual Learning [23.878495627964146]
Continual learning aims to equip models with the ability to retain previously learned knowledge like a human. Existing methods usually overlook the issue of information leakage caused by the fact that the experiment data have been used in pre-trained models. In this paper, we propose a new LoRA-based rehearsal-free method named DESIRE.
arXiv Detail & Related papers (2024-11-28T13:54:01Z)
EfficientTrain++: Generalized Curriculum Learning for Efficient Visual Backbone Training [79.96741042766524]
We reformulate the training curriculum as a soft-selection function. We show that exposing the contents of natural images can be readily achieved by the intensity of data augmentation. The resulting method, EfficientTrain++, is simple, general, yet surprisingly effective.
arXiv Detail & Related papers (2024-05-14T17:00:43Z)
Noisy Self-Training with Synthetic Queries for Dense Retrieval [49.49928764695172]
We introduce a novel noisy self-training framework combined with synthetic queries. Experimental results show that our method improves consistently over existing methods. Our method is data efficient and outperforms competitive baselines.
arXiv Detail & Related papers (2023-11-27T06:19:50Z)
Understanding and Mitigating the Label Noise in Pre-training on Downstream Tasks [91.15120211190519]
This paper aims to understand the nature of noise in pre-training datasets and to mitigate its impact on downstream tasks. We propose a light-weight black-box tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise.
arXiv Detail & Related papers (2023-09-29T06:18:15Z)
Learning Representations for New Sound Classes With Continual Self-Supervised Learning [30.35061954854764]
We present a self-supervised learning framework for continually learning representations for new sound classes. We show that representations learned with the proposed method generalize better and are less susceptible to catastrophic forgetting.
arXiv Detail & Related papers (2022-05-15T22:15:21Z)
Training Dynamics for Text Summarization Models [45.62439188988816]
We analyze the training dynamics for generation models, focusing on news summarization. Across different datasets (CNN/DM, XSum, MediaSum) and summary properties, we study what the model learns at different stages of its fine-tuning process. We find that properties such as copy behavior are learnt earlier in the training process and these observations are robust across domains. On the other hand, factual errors, such as hallucination of unsupported facts, are learnt in the later stages, and this behavior is more varied across domains.
arXiv Detail & Related papers (2021-10-15T21:13:41Z)
Low-Regret Active learning [64.36270166907788]
We develop an online learning algorithm for identifying unlabeled data points that are most informative for training. At the core of our work is an efficient algorithm for sleeping experts that is tailored to achieve low regret on predictable (easy) instances.
arXiv Detail & Related papers (2021-04-06T22:53:45Z)
Coded Machine Unlearning [34.08435990347253]
We present a coded learning protocol where the dataset is linearly coded before the learning phase. We also present the corresponding unlearning protocol for the coded learning model along with a discussion on the proposed protocol's success in ensuring perfect unlearning.
arXiv Detail & Related papers (2020-12-31T17:20:34Z)
Teaching with Commentaries [108.62722733649542]
We propose a flexible teaching framework using commentaries and learned meta-information. We find that commentaries can improve training speed and/or performance. commentaries can be reused when training new models to obtain performance benefits.
arXiv Detail & Related papers (2020-11-05T18:52:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.