Improved Regularization and Robustness for Fine-tuning in Neural
Networks
- URL: http://arxiv.org/abs/2111.04578v1
- Date: Mon, 8 Nov 2021 15:39:44 GMT
- Title: Improved Regularization and Robustness for Fine-tuning in Neural
Networks
- Authors: Dongyue Li and Hongyang R. Zhang
- Abstract summary: A widely used algorithm for transfer learning is fine-tuning, where a pre-trained model is fine-tuned on a target task with a small amount of labeled data.
We propose regularized self-labeling -- the generalization between regularization and self-labeling.
Our approach improves baseline methods by 1.76% (on average) for seven image classification tasks and 0.75% for a few-shot classification task.
- Score: 5.626364462708321
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A widely used algorithm for transfer learning is fine-tuning, where a
pre-trained model is fine-tuned on a target task with a small amount of labeled
data. When the capacity of the pre-trained model is much larger than the size
of the target data set, fine-tuning is prone to overfitting and "memorizing"
the training labels. Hence, an important question is to regularize fine-tuning
and ensure its robustness to noise. To address this question, we begin by
analyzing the generalization properties of fine-tuning. We present a PAC-Bayes
generalization bound that depends on the distance traveled in each layer during
fine-tuning and the noise stability of the fine-tuned model. We empirically
measure these quantities. Based on the analysis, we propose regularized
self-labeling -- the interpolation between regularization and self-labeling
methods, including (i) layer-wise regularization to constrain the distance
traveled in each layer; (ii) self label-correction and label-reweighting to
correct mislabeled data points (that the model is confident) and reweight less
confident data points. We validate our approach on an extensive collection of
image and text data sets using multiple pre-trained model architectures. Our
approach improves baseline methods by 1.76% (on average) for seven image
classification tasks and 0.75% for a few-shot classification task. When the
target data set includes noisy labels, our approach outperforms baseline
methods by 3.56% on average in two noisy settings.
Related papers
- Co-training for Low Resource Scientific Natural Language Inference [65.37685198688538]
We propose a novel co-training method that assigns weights based on the training dynamics of the classifiers to the distantly supervised labels.
By assigning importance weights instead of filtering out examples based on an arbitrary threshold on the predicted confidence, we maximize the usage of automatically labeled data.
The proposed method obtains an improvement of 1.5% in Macro F1 over the distant supervision baseline, and substantial improvements over several other strong SSL baselines.
arXiv Detail & Related papers (2024-06-20T18:35:47Z) - Learning in the Wild: Towards Leveraging Unlabeled Data for Effectively
Tuning Pre-trained Code Models [38.7352992942213]
We propose a novel approach named HINT to improve pre-trained code models with large-scale unlabeled datasets.
HINT includes two main modules: HybrId pseudo-labeled data selection and Noise-tolerant Training.
The experimental results show that HINT can better leverage those unlabeled data in a task-specific way.
arXiv Detail & Related papers (2024-01-02T06:39:00Z) - One-bit Supervision for Image Classification: Problem, Solution, and
Beyond [114.95815360508395]
This paper presents one-bit supervision, a novel setting of learning with fewer labels, for image classification.
We propose a multi-stage training paradigm and incorporate negative label suppression into an off-the-shelf semi-supervised learning algorithm.
In multiple benchmarks, the learning efficiency of the proposed approach surpasses that using full-bit, semi-supervised supervision.
arXiv Detail & Related papers (2023-11-26T07:39:00Z) - Cross-Level Distillation and Feature Denoising for Cross-Domain Few-Shot
Classification [49.36348058247138]
We tackle the problem of cross-domain few-shot classification by making a small proportion of unlabeled images in the target domain accessible in the training stage.
We meticulously design a cross-level knowledge distillation method, which can strengthen the ability of the model to extract more discriminative features in the target dataset.
Our approach can surpass the previous state-of-the-art method, Dynamic-Distillation, by 5.44% on 1-shot and 1.37% on 5-shot classification tasks.
arXiv Detail & Related papers (2023-11-04T12:28:04Z) - Robust Data Pruning under Label Noise via Maximizing Re-labeling
Accuracy [34.02350195269502]
We formalize the problem of data pruning with re-labeling.
We propose a novel data pruning algorithm, Prune4Rel, that finds a subset maximizing the total neighborhood confidence of all training examples.
arXiv Detail & Related papers (2023-11-02T05:40:26Z) - Fine tuning Pre trained Models for Robustness Under Noisy Labels [34.68018860186995]
The presence of noisy labels in a training dataset can significantly impact the performance of machine learning models.
We introduce a novel algorithm called TURN, which robustly and efficiently transfers the prior knowledge of pre-trained models.
arXiv Detail & Related papers (2023-10-24T20:28:59Z) - MAPS: A Noise-Robust Progressive Learning Approach for Source-Free
Domain Adaptive Keypoint Detection [76.97324120775475]
Cross-domain keypoint detection methods always require accessing the source data during adaptation.
This paper considers source-free domain adaptive keypoint detection, where only the well-trained source model is provided to the target domain.
arXiv Detail & Related papers (2023-02-09T12:06:08Z) - Learning with Neighbor Consistency for Noisy Labels [69.83857578836769]
We present a method for learning from noisy labels that leverages similarities between training examples in feature space.
We evaluate our method on datasets evaluating both synthetic (CIFAR-10, CIFAR-100) and realistic (mini-WebVision, Clothing1M, mini-ImageNet-Red) noise.
arXiv Detail & Related papers (2022-02-04T15:46:27Z) - HydraMix-Net: A Deep Multi-task Semi-supervised Learning Approach for
Cell Detection and Classification [14.005379068469361]
Semi-supervised techniques have removed the barriers of large scale labelled set by exploiting unlabelled data to improve the performance of a model.
We propose a semi-supervised deep multi-task classification and localization approach HydraMix-Net in the field of medical imagining.
arXiv Detail & Related papers (2020-08-11T15:00:59Z) - Uncertainty-aware Self-training for Text Classification with Few Labels [54.13279574908808]
We study self-training as one of the earliest semi-supervised learning approaches to reduce the annotation bottleneck.
We propose an approach to improve self-training by incorporating uncertainty estimates of the underlying neural network.
We show our methods leveraging only 20-30 labeled samples per class for each task for training and for validation can perform within 3% of fully supervised pre-trained language models.
arXiv Detail & Related papers (2020-06-27T08:13:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.