Related papers: To Stay or Not to Stay in the Pre-train Basin: Insights on Ensembling in Transfer Learning

To Stay or Not to Stay in the Pre-train Basin: Insights on Ensembling in Transfer Learning

URL: http://arxiv.org/abs/2303.03374v3
Date: Mon, 15 Jan 2024 19:12:13 GMT
Title: To Stay or Not to Stay in the Pre-train Basin: Insights on Ensembling in Transfer Learning
Authors: Ildus Sadrtdinov, Dmitrii Pozdeev, Dmitry Vetrov, Ekaterina Lobacheva
Abstract summary: We show that ensembles trained from a single pre-trained checkpoint may be improved by better exploring the pre-train basin. We propose a more effective modification of the Snapshot Ensembles (SSE) for transfer learning setup, StarSSE.
Score: 3.514757448524572
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transfer learning and ensembling are two popular techniques for improving the performance and robustness of neural networks. Due to the high cost of pre-training, ensembles of models fine-tuned from a single pre-trained checkpoint are often used in practice. Such models end up in the same basin of the loss landscape, which we call the pre-train basin, and thus have limited diversity. In this work, we show that ensembles trained from a single pre-trained checkpoint may be improved by better exploring the pre-train basin, however, leaving the basin results in losing the benefits of transfer learning and in degradation of the ensemble quality. Based on the analysis of existing exploration methods, we propose a more effective modification of the Snapshot Ensembles (SSE) for transfer learning setup, StarSSE, which results in stronger ensembles and uniform model soups.

Related papers

Overtrained Language Models Are Harder to Fine-Tune [64.44743256512237]
Large language models are pre-trained on ever-growing token budgets. We show that extended pre-training can make models harder to fine-tune, leading to degraded final performance.
arXiv Detail & Related papers (2025-03-24T23:11:56Z)
Fine-tuning Reinforcement Learning Models is Secretly a Forgetting Mitigation Problem [12.185261182744377]
This work conceptualizes one specific cause of poor transfer, accentuated in the reinforcement learning setting. A model deteriorates on the state subspace of the downstream task not visited in the initial phase of fine-tuning. We show that standard knowledge retention techniques mitigate the problem and thus allow us to take full advantage of the pre-trained capabilities.
arXiv Detail & Related papers (2024-02-05T10:30:47Z)
What Happens During Finetuning of Vision Transformers: An Invariance Based Investigation [7.432224771219168]
The pretrain-finetune paradigm usually improves downstream performance over training a model from scratch on the same task. In this work, we examine the relationship between pretrained vision transformers and the corresponding finetuned versions on several benchmark datasets and tasks.
arXiv Detail & Related papers (2023-07-12T08:35:24Z)
Continual Learning with Pretrained Backbones by Tuning in the Input Space [44.97953547553997]
The intrinsic difficulty in adapting deep learning models to non-stationary environments limits the applicability of neural networks to real-world tasks. We propose a novel strategy to make the fine-tuning procedure more effective, by avoiding to update the pre-trained part of the network and learning not only the usual classification head, but also a set of newly-introduced learnable parameters.
arXiv Detail & Related papers (2023-06-05T15:11:59Z)
On the Trade-off of Intra-/Inter-class Diversity for Supervised Pre-training [72.8087629914444]
We study the impact of the trade-off between the intra-class diversity (the number of samples per class) and the inter-class diversity (the number of classes) of a supervised pre-training dataset. With the size of the pre-training dataset fixed, the best downstream performance comes with a balance on the intra-/inter-class diversity.
arXiv Detail & Related papers (2023-05-20T16:23:50Z)
TWINS: A Fine-Tuning Framework for Improved Transferability of Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks. We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework. TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z)
Weighted Ensemble Self-Supervised Learning [67.24482854208783]
Ensembling has proven to be a powerful technique for boosting model performance. We develop a framework that permits data-dependent weighted cross-entropy losses. Our method outperforms both in multiple evaluation metrics on ImageNet-1K.
arXiv Detail & Related papers (2022-11-18T02:00:17Z)
Continual Learning of Neural Machine Translation within Low Forgetting Risk Regions [21.488675531980444]
We argue that the widely used regularization-based methods, which perform multi-objective learning with an auxiliary loss, suffer from the misestimate problem. We propose a two-stage training method based on the local features of the real loss. We conduct experiments on domain adaptation and more challenging language adaptation tasks, and the experimental results show that our method can achieve significant improvements.
arXiv Detail & Related papers (2022-11-03T01:21:10Z)
Active Learning for Sequence Tagging with Deep Pre-trained Models and Bayesian Uncertainty Estimates [52.164757178369804]
Recent advances in transfer learning for natural language processing in conjunction with active learning open the possibility to significantly reduce the necessary annotation budget. We conduct an empirical study of various Bayesian uncertainty estimation methods and Monte Carlo dropout options for deep pre-trained models in the active learning framework. We also demonstrate that to acquire instances during active learning, a full-size Transformer can be substituted with a distilled version, which yields better computational performance.
arXiv Detail & Related papers (2021-01-20T13:59:25Z)
The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models [115.49214555402567]
Pre-trained weights often boost a wide range of downstream tasks including classification, detection, and segmentation. Recent studies suggest that pre-training benefits from gigantic model capacity. In this paper, we examine supervised and self-supervised pre-trained models through the lens of the lottery ticket hypothesis (LTH)
arXiv Detail & Related papers (2020-12-12T21:53:55Z)
Deep Ensembles for Low-Data Transfer Learning [21.578470914935938]
We study different ways of creating ensembles from pre-trained models. We show that the nature of pre-training itself is a performant source of diversity. We propose a practical algorithm that efficiently identifies a subset of pre-trained models for any downstream dataset.
arXiv Detail & Related papers (2020-10-14T07:59:00Z)
Unsupervised Transfer Learning for Spatiotemporal Predictive Networks [90.67309545798224]
We study how to transfer knowledge from a zoo of unsupervisedly learned models towards another network. Our motivation is that models are expected to understand complex dynamics from different sources. Our approach yields significant improvements on three benchmarks fortemporal prediction, and benefits the target even from less relevant ones.
arXiv Detail & Related papers (2020-09-24T15:40:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.