SMILE: Self-Distilled MIxup for Efficient Transfer LEarning
- URL: http://arxiv.org/abs/2103.13941v1
- Date: Thu, 25 Mar 2021 16:02:21 GMT
- Title: SMILE: Self-Distilled MIxup for Efficient Transfer LEarning
- Authors: Xingjian Li, Haoyi Xiong, Chengzhong Xu, Dejing Dou
- Abstract summary: In this work, we propose SMILE - Self-Distilled Mixup for EffIcient Transfer LEarning.
With mixed images as inputs, SMILE regularizes the outputs of CNN feature extractors to learn from the mixed feature vectors of inputs.
The triple regularizer balances the mixup effects in both feature and label spaces while bounding the linearity in-between samples for pre-training tasks.
- Score: 42.59451803498095
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To improve the performance of deep learning, mixup has been proposed to force
the neural networks favoring simple linear behaviors in-between training
samples. Performing mixup for transfer learning with pre-trained models however
is not that simple, a high capacity pre-trained model with a large
fully-connected (FC) layer could easily overfit to the target dataset even with
samples-to-labels mixed up. In this work, we propose SMILE - Self-Distilled
Mixup for EffIcient Transfer LEarning. With mixed images as inputs, SMILE
regularizes the outputs of CNN feature extractors to learn from the mixed
feature vectors of inputs (sample-to-feature mixup), in addition to the mixed
labels. Specifically, SMILE incorporates a mean teacher, inherited from the
pre-trained model, to provide the feature vectors of input samples in a
self-distilling fashion, and mixes up the feature vectors accordingly via a
novel triplet regularizer. The triple regularizer balances the mixup effects in
both feature and label spaces while bounding the linearity in-between samples
for pre-training tasks. Extensive experiments have been done to verify the
performance improvement made by SMILE, in comparisons with a wide spectrum of
transfer learning algorithms, including fine-tuning, L2-SP, DELTA, and RIFLE,
even with mixup strategies combined. Ablation studies show that the vanilla
sample-to-label mixup strategies could marginally increase the linearity
in-between training samples but lack of generalizability, while SMILE
significantly improve the mixup effects in both label and feature spaces with
both training and testing datasets. The empirical observations backup our
design intuition and purposes.
Related papers
- Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Learning with Noisy Labels Using Collaborative Sample Selection and
Contrastive Semi-Supervised Learning [76.00798972439004]
Collaborative Sample Selection (CSS) removes noisy samples from identified clean set.
We introduce a co-training mechanism with a contrastive loss in semi-supervised learning.
arXiv Detail & Related papers (2023-10-24T05:37:20Z) - Self-Evolution Learning for Mixup: Enhance Data Augmentation on Few-Shot
Text Classification Tasks [75.42002070547267]
We propose a self evolution learning (SE) based mixup approach for data augmentation in text classification.
We introduce a novel instance specific label smoothing approach, which linearly interpolates the model's output and one hot labels of the original samples to generate new soft for label mixing up.
arXiv Detail & Related papers (2023-05-22T23:43:23Z) - A Data Cartography based MixUp for Pre-trained Language Models [47.90235939359225]
MixUp is a data augmentation strategy where additional samples are generated during training by combining random pairs of training samples and their labels.
We propose TDMixUp, a novel MixUp strategy that leverages Training Dynamics and allows more informative samples to be combined for generating new data samples.
We empirically validate that our method not only achieves competitive performance using a smaller subset of the training data compared with strong baselines, but also yields lower expected calibration error on the pre-trained language model, BERT, on both in-domain and out-of-domain settings in a wide range of NLP tasks.
arXiv Detail & Related papers (2022-05-06T17:59:19Z) - Harnessing Hard Mixed Samples with Decoupled Regularizer [69.98746081734441]
Mixup is an efficient data augmentation approach that improves the generalization of neural networks by smoothing the decision boundary with mixed data.
In this paper, we propose an efficient mixup objective function with a decoupled regularizer named Decoupled Mixup (DM)
DM can adaptively utilize hard mixed samples to mine discriminative features without losing the original smoothness of mixup.
arXiv Detail & Related papers (2022-03-21T07:12:18Z) - Gaussian Mixture Variational Autoencoder with Contrastive Learning for
Multi-Label Classification [27.043136219527767]
We propose a novel contrastive learning boosted multi-label prediction model.
By using contrastive learning in the supervised setting, we can exploit label information effectively.
We show that the learnt embeddings provide insights into the interpretation of label-label interactions.
arXiv Detail & Related papers (2021-12-02T04:23:34Z) - Mixup-Transformer: Dynamic Data Augmentation for NLP Tasks [75.69896269357005]
Mixup is the latest data augmentation technique that linearly interpolates input examples and the corresponding labels.
In this paper, we explore how to apply mixup to natural language processing tasks.
We incorporate mixup to transformer-based pre-trained architecture, named "mixup-transformer", for a wide range of NLP tasks.
arXiv Detail & Related papers (2020-10-05T23:37:30Z) - Rethinking Curriculum Learning with Incremental Labels and Adaptive
Compensation [35.593312267921256]
Like humans, deep networks have been shown to learn better when samples are organized and introduced in a meaningful order or curriculum.
We propose Learning with Incremental Labels and Adaptive Compensation (LILAC), a two-phase method that incrementally increases the number of unique output labels.
arXiv Detail & Related papers (2020-01-13T21:00:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.