To Stay or Not to Stay in the Pre-train Basin: Insights on Ensembling in
Transfer Learning
- URL: http://arxiv.org/abs/2303.03374v3
- Date: Mon, 15 Jan 2024 19:12:13 GMT
- Title: To Stay or Not to Stay in the Pre-train Basin: Insights on Ensembling in
Transfer Learning
- Authors: Ildus Sadrtdinov, Dmitrii Pozdeev, Dmitry Vetrov, Ekaterina Lobacheva
- Abstract summary: We show that ensembles trained from a single pre-trained checkpoint may be improved by better exploring the pre-train basin.
We propose a more effective modification of the Snapshot Ensembles (SSE) for transfer learning setup, StarSSE.
- Score: 3.514757448524572
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transfer learning and ensembling are two popular techniques for improving the
performance and robustness of neural networks. Due to the high cost of
pre-training, ensembles of models fine-tuned from a single pre-trained
checkpoint are often used in practice. Such models end up in the same basin of
the loss landscape, which we call the pre-train basin, and thus have limited
diversity. In this work, we show that ensembles trained from a single
pre-trained checkpoint may be improved by better exploring the pre-train basin,
however, leaving the basin results in losing the benefits of transfer learning
and in degradation of the ensemble quality. Based on the analysis of existing
exploration methods, we propose a more effective modification of the Snapshot
Ensembles (SSE) for transfer learning setup, StarSSE, which results in stronger
ensembles and uniform model soups.
Related papers
- The Fine Line: Navigating Large Language Model Pretraining with Down-streaming Capability Analysis [60.52921835351632]
This paper undertakes a comprehensive comparison of model capabilities at various pretraining intermediate checkpoints.
We confirm that specific downstream metrics exhibit similar training dynamics across models of different sizes.
In addition to our core findings, we've reproduced Amber and OpenLLaMA, releasing their intermediate checkpoints.
arXiv Detail & Related papers (2024-04-01T16:00:01Z) - Fine-tuning Reinforcement Learning Models is Secretly a Forgetting Mitigation Problem [12.185261182744377]
This work conceptualizes one specific cause of poor transfer, accentuated in the reinforcement learning setting.
A model deteriorates on the state subspace of the downstream task not visited in the initial phase of fine-tuning.
We show that standard knowledge retention techniques mitigate the problem and thus allow us to take full advantage of the pre-trained capabilities.
arXiv Detail & Related papers (2024-02-05T10:30:47Z) - What Happens During Finetuning of Vision Transformers: An Invariance
Based Investigation [7.432224771219168]
The pretrain-finetune paradigm usually improves downstream performance over training a model from scratch on the same task.
In this work, we examine the relationship between pretrained vision transformers and the corresponding finetuned versions on several benchmark datasets and tasks.
arXiv Detail & Related papers (2023-07-12T08:35:24Z) - Continual Learning with Pretrained Backbones by Tuning in the Input
Space [44.97953547553997]
The intrinsic difficulty in adapting deep learning models to non-stationary environments limits the applicability of neural networks to real-world tasks.
We propose a novel strategy to make the fine-tuning procedure more effective, by avoiding to update the pre-trained part of the network and learning not only the usual classification head, but also a set of newly-introduced learnable parameters.
arXiv Detail & Related papers (2023-06-05T15:11:59Z) - On the Trade-off of Intra-/Inter-class Diversity for Supervised
Pre-training [72.8087629914444]
We study the impact of the trade-off between the intra-class diversity (the number of samples per class) and the inter-class diversity (the number of classes) of a supervised pre-training dataset.
With the size of the pre-training dataset fixed, the best downstream performance comes with a balance on the intra-/inter-class diversity.
arXiv Detail & Related papers (2023-05-20T16:23:50Z) - TWINS: A Fine-Tuning Framework for Improved Transferability of
Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks.
We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework.
TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z) - Weighted Ensemble Self-Supervised Learning [67.24482854208783]
Ensembling has proven to be a powerful technique for boosting model performance.
We develop a framework that permits data-dependent weighted cross-entropy losses.
Our method outperforms both in multiple evaluation metrics on ImageNet-1K.
arXiv Detail & Related papers (2022-11-18T02:00:17Z) - Continual Learning of Neural Machine Translation within Low Forgetting
Risk Regions [21.488675531980444]
We argue that the widely used regularization-based methods, which perform multi-objective learning with an auxiliary loss, suffer from the misestimate problem.
We propose a two-stage training method based on the local features of the real loss.
We conduct experiments on domain adaptation and more challenging language adaptation tasks, and the experimental results show that our method can achieve significant improvements.
arXiv Detail & Related papers (2022-11-03T01:21:10Z) - The Lottery Tickets Hypothesis for Supervised and Self-supervised
Pre-training in Computer Vision Models [115.49214555402567]
Pre-trained weights often boost a wide range of downstream tasks including classification, detection, and segmentation.
Recent studies suggest that pre-training benefits from gigantic model capacity.
In this paper, we examine supervised and self-supervised pre-trained models through the lens of the lottery ticket hypothesis (LTH)
arXiv Detail & Related papers (2020-12-12T21:53:55Z) - Deep Ensembles for Low-Data Transfer Learning [21.578470914935938]
We study different ways of creating ensembles from pre-trained models.
We show that the nature of pre-training itself is a performant source of diversity.
We propose a practical algorithm that efficiently identifies a subset of pre-trained models for any downstream dataset.
arXiv Detail & Related papers (2020-10-14T07:59:00Z) - Unsupervised Transfer Learning for Spatiotemporal Predictive Networks [90.67309545798224]
We study how to transfer knowledge from a zoo of unsupervisedly learned models towards another network.
Our motivation is that models are expected to understand complex dynamics from different sources.
Our approach yields significant improvements on three benchmarks fortemporal prediction, and benefits the target even from less relevant ones.
arXiv Detail & Related papers (2020-09-24T15:40:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.