Pre-training also Transfers Non-Robustness
- URL: http://arxiv.org/abs/2106.10989v1
- Date: Mon, 21 Jun 2021 11:16:13 GMT
- Title: Pre-training also Transfers Non-Robustness
- Authors: Jiaming Zhang, Jitao Sang, Qi Yi, Huiwen Dong, Jian Yu
- Abstract summary: In spite of its recognized contribution to generalization, pre-training also transfers the non-robustness from pre-trained model into the fine-tuned model.
Results validate the effectiveness in alleviating non-robustness and preserving generalization.
- Score: 20.226917627173126
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-training has enabled many state-of-the-art results on many tasks. In
spite of its recognized contribution to generalization, we observed in this
study that pre-training also transfers the non-robustness from pre-trained
model into the fine-tuned model. Using image classification as an example, we
first conducted experiments on various datasets and network backbones to
explore the factors influencing robustness. Further analysis is conducted on
examining the difference between the fine-tuned model and standard model to
uncover the reason leading to the non-robustness transfer. Finally, we
introduce a simple robust pre-training solution by regularizing the difference
between target and source tasks. Results validate the effectiveness in
alleviating non-robustness and preserving generalization.
Related papers
- Refining 3D Point Cloud Normal Estimation via Sample Selection [13.207964615561261]
We introduce a fundamental framework for normal estimation, enhancing existing model through the incorporation of global information and various constraint mechanisms.
We also utilize existing orientation methods to correct estimated non-oriented normals, achieving state-of-the-art performance in both oriented and non-oriented tasks.
arXiv Detail & Related papers (2024-05-20T02:06:10Z) - On the Generalization Ability of Unsupervised Pretraining [53.06175754026037]
Recent advances in unsupervised learning have shown that unsupervised pre-training, followed by fine-tuning, can improve model generalization.
This paper introduces a novel theoretical framework that illuminates the critical factor influencing the transferability of knowledge acquired during unsupervised pre-training to the subsequent fine-tuning phase.
Our results contribute to a better understanding of unsupervised pre-training and fine-tuning paradigm, and can shed light on the design of more effective pre-training algorithms.
arXiv Detail & Related papers (2024-03-11T16:23:42Z) - What Happens During Finetuning of Vision Transformers: An Invariance
Based Investigation [7.432224771219168]
The pretrain-finetune paradigm usually improves downstream performance over training a model from scratch on the same task.
In this work, we examine the relationship between pretrained vision transformers and the corresponding finetuned versions on several benchmark datasets and tasks.
arXiv Detail & Related papers (2023-07-12T08:35:24Z) - On the Trade-off of Intra-/Inter-class Diversity for Supervised
Pre-training [72.8087629914444]
We study the impact of the trade-off between the intra-class diversity (the number of samples per class) and the inter-class diversity (the number of classes) of a supervised pre-training dataset.
With the size of the pre-training dataset fixed, the best downstream performance comes with a balance on the intra-/inter-class diversity.
arXiv Detail & Related papers (2023-05-20T16:23:50Z) - TWINS: A Fine-Tuning Framework for Improved Transferability of
Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks.
We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework.
TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z) - Guide the Learner: Controlling Product of Experts Debiasing Method Based
on Token Attribution Similarities [17.082695183953486]
A popular workaround is to train a robust model by re-weighting training examples based on a secondary biased model.
Here, the underlying assumption is that the biased model resorts to shortcut features.
We introduce a fine-tuning strategy that incorporates the similarity between the main and biased model attribution scores in a Product of Experts loss function.
arXiv Detail & Related papers (2023-02-06T15:21:41Z) - Self-Distillation for Further Pre-training of Transformers [83.84227016847096]
We propose self-distillation as a regularization for a further pre-training stage.
We empirically validate the efficacy of self-distillation on a variety of benchmark datasets for image and text classification tasks.
arXiv Detail & Related papers (2022-09-30T02:25:12Z) - Robust Transferable Feature Extractors: Learning to Defend Pre-Trained
Networks Against White Box Adversaries [69.53730499849023]
We show that adversarial examples can be successfully transferred to another independently trained model to induce prediction errors.
We propose a deep learning-based pre-processing mechanism, which we refer to as a robust transferable feature extractor (RTFE)
arXiv Detail & Related papers (2022-09-14T21:09:34Z) - On Transfer of Adversarial Robustness from Pretraining to Downstream
Tasks [1.8900691517352295]
We show that the robustness of a linear predictor on downstream tasks can be constrained by the robustness of its underlying representation.
Our results offer an initial step towards characterizing the requirements of the representation function for reliable post-adaptation performance.
arXiv Detail & Related papers (2022-08-07T23:00:40Z) - MEMO: Test Time Robustness via Adaptation and Augmentation [131.28104376280197]
We study the problem of test time robustification, i.e., using the test input to improve model robustness.
Recent prior works have proposed methods for test time adaptation, however, they each introduce additional assumptions.
We propose a simple approach that can be used in any test setting where the model is probabilistic and adaptable.
arXiv Detail & Related papers (2021-10-18T17:55:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.