Hub-Pathway: Transfer Learning from A Hub of Pre-trained Models
- URL: http://arxiv.org/abs/2206.03726v1
- Date: Wed, 8 Jun 2022 08:00:12 GMT
- Title: Hub-Pathway: Transfer Learning from A Hub of Pre-trained Models
- Authors: Yang Shu, Zhangjie Cao, Ziyang Zhang, Jianmin Wang, Mingsheng Long
- Abstract summary: We propose a Hub-Pathway framework to enable knowledge transfer from a model hub.
The proposed framework can be trained end-to-end with the target task-specific loss.
Experiment results on computer vision and reinforcement learning tasks demonstrate that the framework achieves the state-of-the-art performance.
- Score: 89.44031286278347
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transfer learning aims to leverage knowledge from pre-trained models to
benefit the target task. Prior transfer learning work mainly transfers from a
single model. However, with the emergence of deep models pre-trained from
different resources, model hubs consisting of diverse models with various
architectures, pre-trained datasets and learning paradigms are available.
Directly applying single-model transfer learning methods to each model wastes
the abundant knowledge of the model hub and suffers from high computational
cost. In this paper, we propose a Hub-Pathway framework to enable knowledge
transfer from a model hub. The framework generates data-dependent pathway
weights, based on which we assign the pathway routes at the input level to
decide which pre-trained models are activated and passed through, and then set
the pathway aggregation at the output level to aggregate the knowledge from
different models to make predictions. The proposed framework can be trained
end-to-end with the target task-specific loss, where it learns to explore
better pathway configurations and exploit the knowledge in pre-trained models
for each target datum. We utilize a noisy pathway generator and design an
exploration loss to further explore different pathways throughout the model
hub. To fully exploit the knowledge in pre-trained models, each model is
further trained by specific data that activate it, which ensures its
performance and enhances knowledge transfer. Experiment results on computer
vision and reinforcement learning tasks demonstrate that the proposed
Hub-Pathway framework achieves the state-of-the-art performance for model hub
transfer learning.
Related papers
- MergeNet: Knowledge Migration across Heterogeneous Models, Tasks, and Modalities [72.68829963458408]
We present MergeNet, which learns to bridge the gap of parameter spaces of heterogeneous models.
The core mechanism of MergeNet lies in the parameter adapter, which operates by querying the source model's low-rank parameters.
MergeNet is learned alongside both models, allowing our framework to dynamically transfer and adapt knowledge relevant to the current stage.
arXiv Detail & Related papers (2024-04-20T08:34:39Z) - Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning.
Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation.
Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z) - Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning
Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning.
Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset.
We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU)
We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z) - Fantastic Gains and Where to Find Them: On the Existence and Prospect of
General Knowledge Transfer between Any Pretrained Model [74.62272538148245]
We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other.
We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
arXiv Detail & Related papers (2023-10-26T17:59:46Z) - Towards Efficient Task-Driven Model Reprogramming with Foundation Models [52.411508216448716]
Vision foundation models exhibit impressive power, benefiting from the extremely large model capacity and broad training data.
However, in practice, downstream scenarios may only support a small model due to the limited computational resources or efficiency considerations.
This brings a critical challenge for the real-world application of foundation models: one has to transfer the knowledge of a foundation model to the downstream task.
arXiv Detail & Related papers (2023-04-05T07:28:33Z) - Deep Inverse Reinforcement Learning for Route Choice Modeling [0.6853165736531939]
Route choice modeling is a fundamental task in transportation planning and demand forecasting.
This study proposes a general deep inverse reinforcement learning (IRL) framework for link-based route choice modeling.
Experiment results based on taxi GPS data from Shanghai, China validate the improved performance of the proposed model.
arXiv Detail & Related papers (2022-06-18T06:33:06Z) - PAC-Net: A Model Pruning Approach to Inductive Transfer Learning [16.153557870191488]
PAC-Net is a simple yet effective approach for transfer learning based on pruning.
PAC-Net consists of three steps: Prune, Allocate, and Calibrate.
Under the various and extensive set of inductive transfer learning experiments, we show that our method achieves state-of-the-art performance by a large margin.
arXiv Detail & Related papers (2022-06-12T09:45:16Z) - Minimax Lower Bounds for Transfer Learning with Linear and One-hidden
Layer Neural Networks [27.44348371795822]
We develop a statistical minimax framework to characterize the limits of transfer learning.
We derive a lower-bound for the target generalization error achievable by any algorithm as a function of the number of labeled source and target data.
arXiv Detail & Related papers (2020-06-16T22:49:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.