Frozen Overparameterization: A Double Descent Perspective on Transfer
Learning of Deep Neural Networks
- URL: http://arxiv.org/abs/2211.11074v2
- Date: Mon, 12 Jun 2023 17:39:15 GMT
- Title: Frozen Overparameterization: A Double Descent Perspective on Transfer
Learning of Deep Neural Networks
- Authors: Yehuda Dar, Lorenzo Luzi, Richard G. Baraniuk
- Abstract summary: We study the generalization behavior of transfer learning of deep neural networks (DNNs)
We show that the test error evolution during the target training has a more significant double descent effect when the target training dataset is sufficiently large.
Also, we show that the double descent phenomenon may make a transfer from a less related source task better than a transfer from a more related source task.
- Score: 27.17697714584768
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the generalization behavior of transfer learning of deep neural
networks (DNNs). We adopt the overparameterization perspective -- featuring
interpolation of the training data (i.e., approximately zero train error) and
the double descent phenomenon -- to explain the delicate effect of the transfer
learning setting on generalization performance. We study how the generalization
behavior of transfer learning is affected by the dataset size in the source and
target tasks, the number of transferred layers that are kept frozen in the
target DNN training, and the similarity between the source and target tasks. We
show that the test error evolution during the target DNN training has a more
significant double descent effect when the target training dataset is
sufficiently large. In addition, a larger source training dataset can yield a
slower target DNN training. Moreover, we demonstrate that the number of frozen
layers can determine whether the transfer learning is effectively
underparameterized or overparameterized and, in turn, this may induce a
freezing-wise double descent phenomenon that determines the relative success or
failure of learning. Also, we show that the double descent phenomenon may make
a transfer from a less related source task better than a transfer from a more
related source task. We establish our results using image classification
experiments with the ResNet, DenseNet and the vision transformer (ViT)
architectures.
Related papers
- Features are fate: a theory of transfer learning in high-dimensional regression [23.840251319669907]
We show that when the target task is well represented by the feature space of the pre-trained model, transfer learning outperforms training from scratch.
For this model, we establish rigorously that when the feature space overlap between the source and target tasks is sufficiently strong, both linear transfer and fine-tuning improve performance.
arXiv Detail & Related papers (2024-10-10T17:58:26Z) - Why do Learning Rates Transfer? Reconciling Optimization and Scaling
Limits for Deep Learning [77.82908213345864]
We find empirical evidence that learning rate transfer can be attributed to the fact that under $mu$P and its depth extension, the largest eigenvalue of the training loss Hessian is largely independent of the width and depth of the network.
We show that under the neural tangent kernel (NTK) regime, the sharpness exhibits very different dynamics at different scales, thus preventing learning rate transfer.
arXiv Detail & Related papers (2024-02-27T12:28:01Z) - Evaluating the structure of cognitive tasks with transfer learning [67.22168759751541]
This study investigates the transferability of deep learning representations between different EEG decoding tasks.
We conduct extensive experiments using state-of-the-art decoding models on two recently released EEG datasets.
arXiv Detail & Related papers (2023-07-28T14:51:09Z) - An Exploration of Data Efficiency in Intra-Dataset Task Transfer for
Dialog Understanding [65.75873687351553]
This study explores the effects of varying quantities of target task training data on sequential transfer learning in the dialog domain.
Unintuitively, our data shows that often target task training data size has minimal effect on how sequential transfer learning performs compared to the same model without transfer learning.
arXiv Detail & Related papers (2022-10-21T04:36:46Z) - Meta-learning Transferable Representations with a Single Target Domain [46.83481356352768]
Fine-tuning and joint training do not always improve accuracy on downstream tasks.
We propose Meta Representation Learning (MeRLin) to learn transferable features.
MeRLin empirically outperforms previous state-of-the-art transfer learning algorithms on various real-world vision and NLP transfer learning benchmarks.
arXiv Detail & Related papers (2020-11-03T01:57:37Z) - Towards Accurate Knowledge Transfer via Target-awareness Representation
Disentanglement [56.40587594647692]
We propose a novel transfer learning algorithm, introducing the idea of Target-awareness REpresentation Disentanglement (TRED)
TRED disentangles the relevant knowledge with respect to the target task from the original source model and used as a regularizer during fine-tuning the target model.
Experiments on various real world datasets show that our method stably improves the standard fine-tuning by more than 2% in average.
arXiv Detail & Related papers (2020-10-16T17:45:08Z) - Understanding the Effects of Data Parallelism and Sparsity on Neural
Network Training [126.49572353148262]
We study two factors in neural network training: data parallelism and sparsity.
Despite their promising benefits, understanding of their effects on neural network training remains elusive.
arXiv Detail & Related papers (2020-03-25T10:49:22Z) - Curriculum By Smoothing [52.08553521577014]
Convolutional Neural Networks (CNNs) have shown impressive performance in computer vision tasks such as image classification, detection, and segmentation.
We propose an elegant curriculum based scheme that smoothes the feature embedding of a CNN using anti-aliasing or low-pass filters.
As the amount of information in the feature maps increases during training, the network is able to progressively learn better representations of the data.
arXiv Detail & Related papers (2020-03-03T07:27:44Z) - Inter- and Intra-domain Knowledge Transfer for Related Tasks in Deep
Character Recognition [2.320417845168326]
Pre-training a deep neural network on the ImageNet dataset is a common practice for training deep learning models.
The technique of pre-training on one task and then retraining on a new one is called transfer learning.
In this paper we analyse the effectiveness of using deep transfer learning for character recognition tasks.
arXiv Detail & Related papers (2020-01-02T14:18:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.