Learning with Noisy Foundation Models
- URL: http://arxiv.org/abs/2403.06869v1
- Date: Mon, 11 Mar 2024 16:22:41 GMT
- Title: Learning with Noisy Foundation Models
- Authors: Hao Chen, Jindong Wang, Zihan Wang, Ran Tao, Hongxin Wei, Xing Xie,
Masashi Sugiyama, Bhiksha Raj
- Abstract summary: This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.
We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
- Score: 95.50968225050012
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Foundation models are usually pre-trained on large-scale datasets and then
adapted to downstream tasks through tuning. However, the large-scale
pre-training datasets, often inaccessible or too expensive to handle, can
contain label noise that may adversely affect the generalization of the model
and pose unexpected risks. This paper stands out as the first work to
comprehensively understand and analyze the nature of noise in pre-training
datasets and then effectively mitigate its impacts on downstream tasks.
Specifically, through extensive experiments of fully-supervised and image-text
contrastive pre-training on synthetic noisy ImageNet-1K, YFCC15M, and CC12M
datasets, we demonstrate that, while slight noise in pre-training can benefit
in-domain (ID) performance, where the training and testing data share a similar
distribution, it always deteriorates out-of-domain (OOD) performance, where
training and testing distributions are significantly different. These
observations are agnostic to scales of pre-training datasets, pre-training
noise types, model architectures, pre-training objectives, downstream tuning
methods, and downstream applications. We empirically ascertain that the reason
behind this is that the pre-training noise shapes the feature space
differently. We then propose a tuning method (NMTune) to affine the feature
space to mitigate the malignant effect of noise and improve generalization,
which is applicable in both parameter-efficient and black-box tuning manners.
We additionally conduct extensive experiments on popular vision and language
models, including APIs, which are supervised and self-supervised pre-trained on
realistic noisy data for evaluation. Our analysis and results demonstrate the
importance of this novel and fundamental research direction, which we term as
Noisy Model Learning.
Related papers
- Robust Neural Processes for Noisy Data [1.7268667700090563]
We study the behavior of in-context learning models when data is contaminated by noise.
We find that the models that perform best on clean data, are different than the models that perform best on noisy data.
We propose a simple method to train NP models that makes them more robust to noisy data.
arXiv Detail & Related papers (2024-11-03T20:00:55Z) - Fine tuning Pre trained Models for Robustness Under Noisy Labels [34.68018860186995]
The presence of noisy labels in a training dataset can significantly impact the performance of machine learning models.
We introduce a novel algorithm called TURN, which robustly and efficiently transfers the prior knowledge of pre-trained models.
arXiv Detail & Related papers (2023-10-24T20:28:59Z) - Understanding and Mitigating the Label Noise in Pre-training on
Downstream Tasks [91.15120211190519]
This paper aims to understand the nature of noise in pre-training datasets and to mitigate its impact on downstream tasks.
We propose a light-weight black-box tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise.
arXiv Detail & Related papers (2023-09-29T06:18:15Z) - Improving the Robustness of Summarization Models by Detecting and
Removing Input Noise [50.27105057899601]
We present a large empirical study quantifying the sometimes severe loss in performance from different types of input noise for a range of datasets and model sizes.
We propose a light-weight method for detecting and removing such noise in the input during model inference without requiring any training, auxiliary models, or even prior knowledge of the type of noise.
arXiv Detail & Related papers (2022-12-20T00:33:11Z) - Pre-training via Denoising for Molecular Property Prediction [53.409242538744444]
We describe a pre-training technique that utilizes large datasets of 3D molecular structures at equilibrium.
Inspired by recent advances in noise regularization, our pre-training objective is based on denoising.
arXiv Detail & Related papers (2022-05-31T22:28:34Z) - Deep Active Learning with Noise Stability [24.54974925491753]
Uncertainty estimation for unlabeled data is crucial to active learning.
We propose a novel algorithm that leverages noise stability to estimate data uncertainty.
Our method is generally applicable in various tasks, including computer vision, natural language processing, and structural data analysis.
arXiv Detail & Related papers (2022-05-26T13:21:01Z) - The Optimal Noise in Noise-Contrastive Learning Is Not What You Think [80.07065346699005]
We show that deviating from this assumption can actually lead to better statistical estimators.
In particular, the optimal noise distribution is different from the data's and even from a different family.
arXiv Detail & Related papers (2022-03-02T13:59:20Z) - Improved Fine-tuning by Leveraging Pre-training Data: Theory and
Practice [52.11183787786718]
Fine-tuning a pre-trained model on the target data is widely used in many deep learning applications.
Recent studies have empirically shown that training from scratch has the final performance that is no worse than this pre-training strategy.
We propose a novel selection strategy to select a subset from pre-training data to help improve the generalization on the target task.
arXiv Detail & Related papers (2021-11-24T06:18:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.