Optimal transfer protocol by incremental layer defrosting
- URL: http://arxiv.org/abs/2303.01429v1
- Date: Thu, 2 Mar 2023 17:32:11 GMT
- Title: Optimal transfer protocol by incremental layer defrosting
- Authors: Federica Gerace, Diego Doimo, Stefano Sarao Mannelli, Luca Saglietti,
Alessandro Laio
- Abstract summary: Transfer learning is a powerful tool enabling model training with limited amounts of data.
The simplest transfer learning protocol is based on freezing" the feature-extractor layers of a network pre-trained on a data-rich source task.
We show that this protocol is often sub-optimal and the largest performance gain may be achieved when smaller portions of the pre-trained network are kept frozen.
- Score: 66.76153955485584
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transfer learning is a powerful tool enabling model training with limited
amounts of data. This technique is particularly useful in real-world problems
where data availability is often a serious limitation. The simplest transfer
learning protocol is based on ``freezing" the feature-extractor layers of a
network pre-trained on a data-rich source task, and then adapting only the last
layers to a data-poor target task. This workflow is based on the assumption
that the feature maps of the pre-trained model are qualitatively similar to the
ones that would have been learned with enough data on the target task. In this
work, we show that this protocol is often sub-optimal, and the largest
performance gain may be achieved when smaller portions of the pre-trained
network are kept frozen. In particular, we make use of a controlled framework
to identify the optimal transfer depth, which turns out to depend non-trivially
on the amount of available training data and on the degree of source-target
task correlation. We then characterize transfer optimality by analyzing the
internal representations of two networks trained from scratch on the source and
the target task through multiple established similarity measures.
Related papers
- Diffusion-based Neural Network Weights Generation [85.6725307453325]
We propose an efficient and adaptive transfer learning scheme through dataset-conditioned pretrained weights sampling.
Specifically, we use a latent diffusion model with a variational autoencoder that can reconstruct the neural network weights.
arXiv Detail & Related papers (2024-02-28T08:34:23Z) - Analysis of Task Transferability in Large Pre-trained Classifiers [11.517862889784293]
We analyze the transfer of performance for classification tasks, when only the last linear layer of the source model is fine-tuned on the target task.
We propose a novel Task Transfer Analysis approach that transforms the source distribution (and classifier) by changing the class prior distribution, label, and feature spaces.
We perform a large-scale empirical study by using state-of-the-art pre-trained models and demonstrate the effectiveness of our bound and optimization at predicting transferability.
arXiv Detail & Related papers (2023-07-03T08:06:22Z) - An Exploration of Data Efficiency in Intra-Dataset Task Transfer for
Dialog Understanding [65.75873687351553]
This study explores the effects of varying quantities of target task training data on sequential transfer learning in the dialog domain.
Unintuitively, our data shows that often target task training data size has minimal effect on how sequential transfer learning performs compared to the same model without transfer learning.
arXiv Detail & Related papers (2022-10-21T04:36:46Z) - Knowledge Distillation as Efficient Pre-training: Faster Convergence,
Higher Data-efficiency, and Better Transferability [53.27240222619834]
Knowledge Distillation as Efficient Pre-training aims to efficiently transfer the learned feature representation from pre-trained models to new student models for future downstream tasks.
Our method performs comparably with supervised pre-training counterparts in 3 downstream tasks and 9 downstream datasets requiring 10x less data and 5x less pre-training time.
arXiv Detail & Related papers (2022-03-10T06:23:41Z) - Probing transfer learning with a model of synthetic correlated datasets [11.53207294639557]
Transfer learning can significantly improve the sample efficiency of neural networks.
We re-think a solvable model of synthetic data as a framework for modeling correlation between data-sets.
We show that our model can capture a range of salient features of transfer learning with real data.
arXiv Detail & Related papers (2021-06-09T22:15:41Z) - Adversarially-Trained Deep Nets Transfer Better: Illustration on Image
Classification [53.735029033681435]
Transfer learning is a powerful methodology for adapting pre-trained deep neural networks on image recognition tasks to new domains.
In this work, we demonstrate that adversarially-trained models transfer better than non-adversarially-trained models.
arXiv Detail & Related papers (2020-07-11T22:48:42Z) - Uniform Priors for Data-Efficient Transfer [65.086680950871]
We show that features that are most transferable have high uniformity in the embedding space.
We evaluate the regularization on its ability to facilitate adaptation to unseen tasks and data.
arXiv Detail & Related papers (2020-06-30T04:39:36Z) - Minimax Lower Bounds for Transfer Learning with Linear and One-hidden
Layer Neural Networks [27.44348371795822]
We develop a statistical minimax framework to characterize the limits of transfer learning.
We derive a lower-bound for the target generalization error achievable by any algorithm as a function of the number of labeled source and target data.
arXiv Detail & Related papers (2020-06-16T22:49:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.