Understanding and Mitigating the Label Noise in Pre-training on
Downstream Tasks
- URL: http://arxiv.org/abs/2309.17002v2
- Date: Mon, 11 Mar 2024 15:59:28 GMT
- Title: Understanding and Mitigating the Label Noise in Pre-training on
Downstream Tasks
- Authors: Hao Chen, Jindong Wang, Ankit Shah, Ran Tao, Hongxin Wei, Xing Xie,
Masashi Sugiyama, Bhiksha Raj
- Abstract summary: This paper aims to understand the nature of noise in pre-training datasets and to mitigate its impact on downstream tasks.
We propose a light-weight black-box tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise.
- Score: 91.15120211190519
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pre-training on large-scale datasets and then fine-tuning on downstream tasks
have become a standard practice in deep learning. However, pre-training data
often contain label noise that may adversely affect the generalization of the
model. This paper aims to understand the nature of noise in pre-training
datasets and to mitigate its impact on downstream tasks. More specifically,
through extensive experiments of supervised pre-training models on synthetic
noisy ImageNet-1K and YFCC15M datasets, we demonstrate that while slight noise
in pre-training can benefit in-domain (ID) transfer performance, where the
training and testing data share the same distribution, it always deteriorates
out-of-domain (OOD) performance, where training and testing data distribution
are different. We empirically verify that the reason behind is noise in
pre-training shapes the feature space differently. We then propose a
light-weight black-box tuning method (NMTune) to affine the feature space to
mitigate the malignant effect of noise and improve generalization on both ID
and OOD tasks, considering one may not be able to fully fine-tune or even
access the pre-trained models. We conduct practical experiments on popular
vision and language models that are pre-trained on noisy data for evaluation of
our approach. Our analysis and results show the importance of this interesting
and novel research direction, which we term Noisy Model Learning.
Related papers
- Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.
We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - Fine tuning Pre trained Models for Robustness Under Noisy Labels [34.68018860186995]
The presence of noisy labels in a training dataset can significantly impact the performance of machine learning models.
We introduce a novel algorithm called TURN, which robustly and efficiently transfers the prior knowledge of pre-trained models.
arXiv Detail & Related papers (2023-10-24T20:28:59Z) - Double Descent and Overfitting under Noisy Inputs and Distribution Shift for Linear Denoisers [3.481985817302898]
A concern about studying supervised denoising is that one might not always have noiseless training data from the test distribution.
Motivated by this, we study supervised denoising and noisy-input regression under distribution shift.
arXiv Detail & Related papers (2023-05-26T22:41:40Z) - Solving Inverse Problems with Score-Based Generative Priors learned from
Noisy Data [1.7969777786551424]
SURE-Score is an approach for learning score-based generative models using training samples corrupted by additive Gaussian noise.
We demonstrate the generality of SURE-Score by learning priors and applying posterior sampling to ill-posed inverse problems in two practical applications.
arXiv Detail & Related papers (2023-05-02T02:51:01Z) - Improving the Robustness of Summarization Models by Detecting and
Removing Input Noise [50.27105057899601]
We present a large empirical study quantifying the sometimes severe loss in performance from different types of input noise for a range of datasets and model sizes.
We propose a light-weight method for detecting and removing such noise in the input during model inference without requiring any training, auxiliary models, or even prior knowledge of the type of noise.
arXiv Detail & Related papers (2022-12-20T00:33:11Z) - Pre-training via Denoising for Molecular Property Prediction [53.409242538744444]
We describe a pre-training technique that utilizes large datasets of 3D molecular structures at equilibrium.
Inspired by recent advances in noise regularization, our pre-training objective is based on denoising.
arXiv Detail & Related papers (2022-05-31T22:28:34Z) - The Optimal Noise in Noise-Contrastive Learning Is Not What You Think [80.07065346699005]
We show that deviating from this assumption can actually lead to better statistical estimators.
In particular, the optimal noise distribution is different from the data's and even from a different family.
arXiv Detail & Related papers (2022-03-02T13:59:20Z) - Improved Fine-tuning by Leveraging Pre-training Data: Theory and
Practice [52.11183787786718]
Fine-tuning a pre-trained model on the target data is widely used in many deep learning applications.
Recent studies have empirically shown that training from scratch has the final performance that is no worse than this pre-training strategy.
We propose a novel selection strategy to select a subset from pre-training data to help improve the generalization on the target task.
arXiv Detail & Related papers (2021-11-24T06:18:32Z) - Self-Adaptive Training: beyond Empirical Risk Minimization [15.59721834388181]
We propose a new training algorithm that dynamically corrects problematic labels by model predictions without incurring extra computational cost.
Self-adaptive training significantly improves generalization over various levels of noises, and mitigates the overfitting issue in both natural and adversarial training.
Experiments on CIFAR and ImageNet datasets verify the effectiveness of our approach in two applications.
arXiv Detail & Related papers (2020-02-24T15:47:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.