Related papers: Improved Regularization and Robustness for Fine-tuning in Neural Networks

Improved Regularization and Robustness for Fine-tuning in Neural Networks

URL: http://arxiv.org/abs/2111.04578v1
Date: Mon, 8 Nov 2021 15:39:44 GMT
Title: Improved Regularization and Robustness for Fine-tuning in Neural Networks
Authors: Dongyue Li and Hongyang R. Zhang
Abstract summary: A widely used algorithm for transfer learning is fine-tuning, where a pre-trained model is fine-tuned on a target task with a small amount of labeled data. We propose regularized self-labeling -- the generalization between regularization and self-labeling. Our approach improves baseline methods by 1.76% (on average) for seven image classification tasks and 0.75% for a few-shot classification task.
Score: 5.626364462708321
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A widely used algorithm for transfer learning is fine-tuning, where a pre-trained model is fine-tuned on a target task with a small amount of labeled data. When the capacity of the pre-trained model is much larger than the size of the target data set, fine-tuning is prone to overfitting and "memorizing" the training labels. Hence, an important question is to regularize fine-tuning and ensure its robustness to noise. To address this question, we begin by analyzing the generalization properties of fine-tuning. We present a PAC-Bayes generalization bound that depends on the distance traveled in each layer during fine-tuning and the noise stability of the fine-tuned model. We empirically measure these quantities. Based on the analysis, we propose regularized self-labeling -- the interpolation between regularization and self-labeling methods, including (i) layer-wise regularization to constrain the distance traveled in each layer; (ii) self label-correction and label-reweighting to correct mislabeled data points (that the model is confident) and reweight less confident data points. We validate our approach on an extensive collection of image and text data sets using multiple pre-trained model architectures. Our approach improves baseline methods by 1.76% (on average) for seven image classification tasks and 0.75% for a few-shot classification task. When the target data set includes noisy labels, our approach outperforms baseline methods by 3.56% on average in two noisy settings.

Related papers

Co-training for Low Resource Scientific Natural Language Inference [65.37685198688538]
We propose a novel co-training method that assigns weights based on the training dynamics of the classifiers to the distantly supervised labels. By assigning importance weights instead of filtering out examples based on an arbitrary threshold on the predicted confidence, we maximize the usage of automatically labeled data. The proposed method obtains an improvement of 1.5% in Macro F1 over the distant supervision baseline, and substantial improvements over several other strong SSL baselines.
arXiv Detail & Related papers (2024-06-20T18:35:47Z)
Learning in the Wild: Towards Leveraging Unlabeled Data for Effectively Tuning Pre-trained Code Models [38.7352992942213]
We propose a novel approach named HINT to improve pre-trained code models with large-scale unlabeled datasets. HINT includes two main modules: HybrId pseudo-labeled data selection and Noise-tolerant Training. The experimental results show that HINT can better leverage those unlabeled data in a task-specific way.
arXiv Detail & Related papers (2024-01-02T06:39:00Z)
One-bit Supervision for Image Classification: Problem, Solution, and Beyond [114.95815360508395]
This paper presents one-bit supervision, a novel setting of learning with fewer labels, for image classification. We propose a multi-stage training paradigm and incorporate negative label suppression into an off-the-shelf semi-supervised learning algorithm. In multiple benchmarks, the learning efficiency of the proposed approach surpasses that using full-bit, semi-supervised supervision.
arXiv Detail & Related papers (2023-11-26T07:39:00Z)
Cross-Level Distillation and Feature Denoising for Cross-Domain Few-Shot Classification [49.36348058247138]
We tackle the problem of cross-domain few-shot classification by making a small proportion of unlabeled images in the target domain accessible in the training stage. We meticulously design a cross-level knowledge distillation method, which can strengthen the ability of the model to extract more discriminative features in the target dataset. Our approach can surpass the previous state-of-the-art method, Dynamic-Distillation, by 5.44% on 1-shot and 1.37% on 5-shot classification tasks.
arXiv Detail & Related papers (2023-11-04T12:28:04Z)
Robust Data Pruning under Label Noise via Maximizing Re-labeling Accuracy [34.02350195269502]
We formalize the problem of data pruning with re-labeling. We propose a novel data pruning algorithm, Prune4Rel, that finds a subset maximizing the total neighborhood confidence of all training examples.
arXiv Detail & Related papers (2023-11-02T05:40:26Z)
Fine tuning Pre trained Models for Robustness Under Noisy Labels [34.68018860186995]
The presence of noisy labels in a training dataset can significantly impact the performance of machine learning models. We introduce a novel algorithm called TURN, which robustly and efficiently transfers the prior knowledge of pre-trained models.
arXiv Detail & Related papers (2023-10-24T20:28:59Z)
MAPS: A Noise-Robust Progressive Learning Approach for Source-Free Domain Adaptive Keypoint Detection [76.97324120775475]
Cross-domain keypoint detection methods always require accessing the source data during adaptation. This paper considers source-free domain adaptive keypoint detection, where only the well-trained source model is provided to the target domain.
arXiv Detail & Related papers (2023-02-09T12:06:08Z)
Learning with Neighbor Consistency for Noisy Labels [69.83857578836769]
We present a method for learning from noisy labels that leverages similarities between training examples in feature space. We evaluate our method on datasets evaluating both synthetic (CIFAR-10, CIFAR-100) and realistic (mini-WebVision, Clothing1M, mini-ImageNet-Red) noise.
arXiv Detail & Related papers (2022-02-04T15:46:27Z)
HydraMix-Net: A Deep Multi-task Semi-supervised Learning Approach for Cell Detection and Classification [14.005379068469361]
Semi-supervised techniques have removed the barriers of large scale labelled set by exploiting unlabelled data to improve the performance of a model. We propose a semi-supervised deep multi-task classification and localization approach HydraMix-Net in the field of medical imagining.
arXiv Detail & Related papers (2020-08-11T15:00:59Z)
Uncertainty-aware Self-training for Text Classification with Few Labels [54.13279574908808]
We study self-training as one of the earliest semi-supervised learning approaches to reduce the annotation bottleneck. We propose an approach to improve self-training by incorporating uncertainty estimates of the underlying neural network. We show our methods leveraging only 20-30 labeled samples per class for each task for training and for validation can perform within 3% of fully supervised pre-trained language models.
arXiv Detail & Related papers (2020-06-27T08:13:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.