Related papers: Towards Understanding Feature Learning in Parameter Transfer

Towards Understanding Feature Learning in Parameter Transfer

URL: http://arxiv.org/abs/2509.22056v1
Date: Fri, 26 Sep 2025 08:37:54 GMT
Title: Towards Understanding Feature Learning in Parameter Transfer
Authors: Hua Yuan, Xuran Meng, Qiufeng Wang, Shiyu Xia, Ning Xu, Xu Yang, Jing Wang, Xin Geng, Yong Rui,
Abstract summary: We analyze a setting in which both the upstream and downstream models are ReLU convolutional neural networks (CNNs)<n>We characterize how the inherited parameters act as carriers of universal knowledge and identify key factors that amplify their beneficial impact on the target task.<n>Our analysis provides insight into why, in certain cases, transferring parameters can lead to lower test accuracy on the target task than training a new model from scratch.
Score: 47.063219231351916
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Parameter transfer is a central paradigm in transfer learning, enabling knowledge reuse across tasks and domains by sharing model parameters between upstream and downstream models. However, when only a subset of parameters from the upstream model is transferred to the downstream model, there remains a lack of theoretical understanding of the conditions under which such partial parameter reuse is beneficial and of the factors that govern its effectiveness. To address this gap, we analyze a setting in which both the upstream and downstream models are ReLU convolutional neural networks (CNNs). Within this theoretical framework, we characterize how the inherited parameters act as carriers of universal knowledge and identify key factors that amplify their beneficial impact on the target task. Furthermore, our analysis provides insight into why, in certain cases, transferring parameters can lead to lower test accuracy on the target task than training a new model from scratch. Numerical experiments and real-world data experiments are conducted to empirically validate our theoretical findings.

Related papers

Neural Network Reprogrammability: A Unified Theme on Model Reprogramming, Prompt Tuning, and Prompt Instruction [57.19302613163439]
We introduce neural network reprogrammability as a unifying framework for model adaptation.<n>We present a taxonomy that categorizes such information manipulation approaches across four key dimensions.<n>We also analyze remaining technical challenges and ethical considerations.
arXiv Detail & Related papers (2025-06-05T05:42:27Z)
Learning a Sparse Neural Network using IHT [1.124958340749622]
This paper relies on results from the domain of advanced sparse optimization, particularly those addressing nonlinear differentiable functions. As computational power for training NNs increases, so does the complexity of the models in terms of a higher number of parameters. This paper aims to investigate whether the theoretical prerequisites for such convergence are applicable in the realm of neural network (NN) training.
arXiv Detail & Related papers (2024-04-29T04:10:22Z)
MergeNet: Knowledge Migration across Heterogeneous Models, Tasks, and Modalities [72.05167902805405]
We present MergeNet, which learns to bridge the gap of parameter spaces of heterogeneous models.<n>The core mechanism of MergeNet lies in the parameter adapter, which operates by querying the source model's low-rank parameters.<n> MergeNet is learned alongside both models, allowing our framework to dynamically transfer and adapt knowledge relevant to the current stage.
arXiv Detail & Related papers (2024-04-20T08:34:39Z)
Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective [106.92016199403042]
We empirically investigate knowledge transfer from larger to smaller models through a parametric perspective. We employ sensitivity-based techniques to extract and align knowledge-specific parameters between different large language models. Our findings highlight the critical factors contributing to the process of parametric knowledge transfer.
arXiv Detail & Related papers (2023-10-17T17:58:34Z)
Generative Causal Representation Learning for Out-of-Distribution Motion Forecasting [13.99348653165494]
We propose Generative Causal Learning Representation to facilitate knowledge transfer under distribution shifts. While we evaluate the effectiveness of our proposed method in human trajectory prediction models, GCRL can be applied to other domains as well.
arXiv Detail & Related papers (2023-02-17T00:30:44Z)
Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters. We find that our approach successfully generates parameters for a wide range of loss prompts. We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z)
Transfer Learning for Linear Regression: a Statistical Test of Gain [2.1550839871882017]
Transfer learning aims at reusing knowledge from a source dataset to a similar target one. It is shown that the quality of transfer for a new input vector $x$ depends on its representation in an eigenbasis. A statistical test is constructed to predict whether a fine-tuned model has a lower prediction quadratic risk than the base target model.
arXiv Detail & Related papers (2021-02-18T17:46:26Z)
On the Sparsity of Neural Machine Translation Models [65.49762428553345]
We investigate whether redundant parameters can be reused to achieve better performance. Experiments and analyses are systematically conducted on different datasets and NMT architectures.
arXiv Detail & Related papers (2020-10-06T11:47:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.