Related papers: Frozen Overparameterization: A Double Descent Perspective on Transfer Learning of Deep Neural Networks

Frozen Overparameterization: A Double Descent Perspective on Transfer Learning of Deep Neural Networks

URL: http://arxiv.org/abs/2211.11074v2
Date: Mon, 12 Jun 2023 17:39:15 GMT
Title: Frozen Overparameterization: A Double Descent Perspective on Transfer Learning of Deep Neural Networks
Authors: Yehuda Dar, Lorenzo Luzi, Richard G. Baraniuk
Abstract summary: We study the generalization behavior of transfer learning of deep neural networks (DNNs) We show that the test error evolution during the target training has a more significant double descent effect when the target training dataset is sufficiently large. Also, we show that the double descent phenomenon may make a transfer from a less related source task better than a transfer from a more related source task.
Score: 27.17697714584768
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the generalization behavior of transfer learning of deep neural networks (DNNs). We adopt the overparameterization perspective -- featuring interpolation of the training data (i.e., approximately zero train error) and the double descent phenomenon -- to explain the delicate effect of the transfer learning setting on generalization performance. We study how the generalization behavior of transfer learning is affected by the dataset size in the source and target tasks, the number of transferred layers that are kept frozen in the target DNN training, and the similarity between the source and target tasks. We show that the test error evolution during the target DNN training has a more significant double descent effect when the target training dataset is sufficiently large. In addition, a larger source training dataset can yield a slower target DNN training. Moreover, we demonstrate that the number of frozen layers can determine whether the transfer learning is effectively underparameterized or overparameterized and, in turn, this may induce a freezing-wise double descent phenomenon that determines the relative success or failure of learning. Also, we show that the double descent phenomenon may make a transfer from a less related source task better than a transfer from a more related source task. We establish our results using image classification experiments with the ResNet, DenseNet and the vision transformer (ViT) architectures.

Related papers

Transfer Learning in Infinite Width Feature Learning Networks [35.95321041944522]
We develop a theory of transfer learning in infinitely wide neural networks where both the pretraining (source) and downstream (target) task can operate in a feature learning regime.<n>We analyze both the Bayesian framework, where learning is described by a posterior distribution over the weights, and gradient flow training of randomly gradient networks trained with weight decay.<n>The summary statistics of these theories are adapted feature kernels which, after transfer learning, depend on data and labels from both source and target tasks.
arXiv Detail & Related papers (2025-07-06T16:14:43Z)
Features are fate: a theory of transfer learning in high-dimensional regression [23.840251319669907]
We show that when the target task is well represented by the feature space of the pre-trained model, transfer learning outperforms training from scratch. For this model, we establish rigorously that when the feature space overlap between the source and target tasks is sufficiently strong, both linear transfer and fine-tuning improve performance.
arXiv Detail & Related papers (2024-10-10T17:58:26Z)
Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning. Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation. Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z)
Evaluating the structure of cognitive tasks with transfer learning [67.22168759751541]
This study investigates the transferability of deep learning representations between different EEG decoding tasks. We conduct extensive experiments using state-of-the-art decoding models on two recently released EEG datasets.
arXiv Detail & Related papers (2023-07-28T14:51:09Z)
Deep Augmentation: Self-Supervised Learning with Transformations in Activation Space [19.495587566796278]
We introduce Deep Augmentation, an approach to implicit data augmentation using dropout or PCA to transform a targeted layer within a neural network to improve performance and generalization. We demonstrate Deep Augmentation through extensive experiments on contrastive learning tasks in NLP, computer vision, and graph learning.
arXiv Detail & Related papers (2023-03-25T19:03:57Z)
An Exploration of Data Efficiency in Intra-Dataset Task Transfer for Dialog Understanding [65.75873687351553]
This study explores the effects of varying quantities of target task training data on sequential transfer learning in the dialog domain. Unintuitively, our data shows that often target task training data size has minimal effect on how sequential transfer learning performs compared to the same model without transfer learning.
arXiv Detail & Related papers (2022-10-21T04:36:46Z)
Meta-learning Transferable Representations with a Single Target Domain [46.83481356352768]
Fine-tuning and joint training do not always improve accuracy on downstream tasks. We propose Meta Representation Learning (MeRLin) to learn transferable features. MeRLin empirically outperforms previous state-of-the-art transfer learning algorithms on various real-world vision and NLP transfer learning benchmarks.
arXiv Detail & Related papers (2020-11-03T01:57:37Z)
Towards Accurate Knowledge Transfer via Target-awareness Representation Disentanglement [56.40587594647692]
We propose a novel transfer learning algorithm, introducing the idea of Target-awareness REpresentation Disentanglement (TRED) TRED disentangles the relevant knowledge with respect to the target task from the original source model and used as a regularizer during fine-tuning the target model. Experiments on various real world datasets show that our method stably improves the standard fine-tuning by more than 2% in average.
arXiv Detail & Related papers (2020-10-16T17:45:08Z)
Understanding the Effects of Data Parallelism and Sparsity on Neural Network Training [126.49572353148262]
We study two factors in neural network training: data parallelism and sparsity. Despite their promising benefits, understanding of their effects on neural network training remains elusive.
arXiv Detail & Related papers (2020-03-25T10:49:22Z)
Curriculum By Smoothing [52.08553521577014]
Convolutional Neural Networks (CNNs) have shown impressive performance in computer vision tasks such as image classification, detection, and segmentation. We propose an elegant curriculum based scheme that smoothes the feature embedding of a CNN using anti-aliasing or low-pass filters. As the amount of information in the feature maps increases during training, the network is able to progressively learn better representations of the data.
arXiv Detail & Related papers (2020-03-03T07:27:44Z)
Inter- and Intra-domain Knowledge Transfer for Related Tasks in Deep Character Recognition [2.320417845168326]
Pre-training a deep neural network on the ImageNet dataset is a common practice for training deep learning models. The technique of pre-training on one task and then retraining on a new one is called transfer learning. In this paper we analyse the effectiveness of using deep transfer learning for character recognition tasks.
arXiv Detail & Related papers (2020-01-02T14:18:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.