Related papers: Features are fate: a theory of transfer learning in high-dimensional regression

Features are fate: a theory of transfer learning in high-dimensional regression

URL: http://arxiv.org/abs/2410.08194v1
Date: Thu, 10 Oct 2024 17:58:26 GMT
Title: Features are fate: a theory of transfer learning in high-dimensional regression
Authors: Javan Tahir, Surya Ganguli, Grant M. Rotskoff,
Abstract summary: We show that when the target task is well represented by the feature space of the pre-trained model, transfer learning outperforms training from scratch. For this model, we establish rigorously that when the feature space overlap between the source and target tasks is sufficiently strong, both linear transfer and fine-tuning improve performance.
Score: 23.840251319669907
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the emergence of large-scale pre-trained neural networks, methods to adapt such "foundation" models to data-limited downstream tasks have become a necessity. Fine-tuning, preference optimization, and transfer learning have all been successfully employed for these purposes when the target task closely resembles the source task, but a precise theoretical understanding of "task similarity" is still lacking. While conventional wisdom suggests that simple measures of similarity between source and target distributions, such as $\phi$-divergences or integral probability metrics, can directly predict the success of transfer, we prove the surprising fact that, in general, this is not the case. We adopt, instead, a feature-centric viewpoint on transfer learning and establish a number of theoretical results that demonstrate that when the target task is well represented by the feature space of the pre-trained model, transfer learning outperforms training from scratch. We study deep linear networks as a minimal model of transfer learning in which we can analytically characterize the transferability phase diagram as a function of the target dataset size and the feature space overlap. For this model, we establish rigorously that when the feature space overlap between the source and target tasks is sufficiently strong, both linear transfer and fine-tuning improve performance, especially in the low data limit. These results build on an emerging understanding of feature learning dynamics in deep linear networks, and we demonstrate numerically that the rigorous results we derive for the linear case also apply to nonlinear networks.

Related papers

Transfer Learning in Infinite Width Feature Learning Networks [35.95321041944522]
We develop a theory of transfer learning in infinitely wide neural networks where both the pretraining (source) and downstream (target) task can operate in a feature learning regime.<n>We analyze both the Bayesian framework, where learning is described by a posterior distribution over the weights, and gradient flow training of randomly gradient networks trained with weight decay.<n>The summary statistics of these theories are adapted feature kernels which, after transfer learning, depend on data and labels from both source and target tasks.
arXiv Detail & Related papers (2025-07-06T16:14:43Z)
Transfer Learning through Enhanced Sufficient Representation: Enriching Source Domain Knowledge with Target Data [2.308168896770315]
We introduce a novel method for transfer learning called Transfer learning through Enhanced Sufficient Representation (TESR) Our approach begins by estimating a sufficient and invariant representation from the source domains. This representation is then enhanced with an independent component derived from the target data, ensuring that it is sufficient for the target domain and adaptable to its specific characteristics.
arXiv Detail & Related papers (2025-02-22T13:18:28Z)
Optimal transfer protocol by incremental layer defrosting [66.76153955485584]
Transfer learning is a powerful tool enabling model training with limited amounts of data. The simplest transfer learning protocol is based on freezing" the feature-extractor layers of a network pre-trained on a data-rich source task. We show that this protocol is often sub-optimal and the largest performance gain may be achieved when smaller portions of the pre-trained network are kept frozen.
arXiv Detail & Related papers (2023-03-02T17:32:11Z)
Frozen Overparameterization: A Double Descent Perspective on Transfer Learning of Deep Neural Networks [27.17697714584768]
We study the generalization behavior of transfer learning of deep neural networks (DNNs) We show that the test error evolution during the target training has a more significant double descent effect when the target training dataset is sufficiently large. Also, we show that the double descent phenomenon may make a transfer from a less related source task better than a transfer from a more related source task.
arXiv Detail & Related papers (2022-11-20T20:26:23Z)
Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs. By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z)
How Well Do Sparse Imagenet Models Transfer? [75.98123173154605]
Transfer learning is a classic paradigm by which models pretrained on large "upstream" datasets are adapted to yield good results on "downstream" datasets. In this work, we perform an in-depth investigation of this phenomenon in the context of convolutional neural networks (CNNs) trained on the ImageNet dataset. We show that sparse models can match or even outperform the transfer performance of dense models, even at high sparsities.
arXiv Detail & Related papers (2021-11-26T11:58:51Z)
Probing transfer learning with a model of synthetic correlated datasets [11.53207294639557]
Transfer learning can significantly improve the sample efficiency of neural networks. We re-think a solvable model of synthetic data as a framework for modeling correlation between data-sets. We show that our model can capture a range of salient features of transfer learning with real data.
arXiv Detail & Related papers (2021-06-09T22:15:41Z)
Towards Accurate Knowledge Transfer via Target-awareness Representation Disentanglement [56.40587594647692]
We propose a novel transfer learning algorithm, introducing the idea of Target-awareness REpresentation Disentanglement (TRED) TRED disentangles the relevant knowledge with respect to the target task from the original source model and used as a regularizer during fine-tuning the target model. Experiments on various real world datasets show that our method stably improves the standard fine-tuning by more than 2% in average.
arXiv Detail & Related papers (2020-10-16T17:45:08Z)
Uniform Priors for Data-Efficient Transfer [65.086680950871]
We show that features that are most transferable have high uniformity in the embedding space. We evaluate the regularization on its ability to facilitate adaptation to unseen tasks and data.
arXiv Detail & Related papers (2020-06-30T04:39:36Z)
Minimax Lower Bounds for Transfer Learning with Linear and One-hidden Layer Neural Networks [27.44348371795822]
We develop a statistical minimax framework to characterize the limits of transfer learning. We derive a lower-bound for the target generalization error achievable by any algorithm as a function of the number of labeled source and target data.
arXiv Detail & Related papers (2020-06-16T22:49:26Z)
Understanding the Effects of Data Parallelism and Sparsity on Neural Network Training [126.49572353148262]
We study two factors in neural network training: data parallelism and sparsity. Despite their promising benefits, understanding of their effects on neural network training remains elusive.
arXiv Detail & Related papers (2020-03-25T10:49:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.