Zoo-Tuning: Adaptive Transfer from a Zoo of Models
- URL: http://arxiv.org/abs/2106.15434v1
- Date: Tue, 29 Jun 2021 14:09:45 GMT
- Title: Zoo-Tuning: Adaptive Transfer from a Zoo of Models
- Authors: Yang Shu, Zhi Kou, Zhangjie Cao, Jianmin Wang, Mingsheng Long
- Abstract summary: Zoo-Tuning learns to adaptively transfer the parameters of pretrained models to the target task.
We evaluate our approach on a variety of tasks, including reinforcement learning, image classification, and facial landmark detection.
- Score: 82.9120546160422
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the development of deep networks on various large-scale datasets, a
large zoo of pretrained models are available. When transferring from a model
zoo, applying classic single-model based transfer learning methods to each
source model suffers from high computational burden and cannot fully utilize
the rich knowledge in the zoo. We propose \emph{Zoo-Tuning} to address these
challenges, which learns to adaptively transfer the parameters of pretrained
models to the target task. With the learnable channel alignment layer and
adaptive aggregation layer, Zoo-Tuning \emph{adaptively aggregates channel
aligned pretrained parameters} to derive the target model, which promotes
knowledge transfer by simultaneously adapting multiple source models to
downstream tasks. The adaptive aggregation substantially reduces the
computation cost at both training and inference. We further propose lite
Zoo-Tuning with the temporal ensemble of batch average gating values to reduce
the storage cost at the inference time. We evaluate our approach on a variety
of tasks, including reinforcement learning, image classification, and facial
landmark detection. Experiment results demonstrate that the proposed adaptive
transfer learning approach can transfer knowledge from a zoo of models more
effectively and efficiently.
Related papers
- Combining Denoising Autoencoders with Contrastive Learning to fine-tune Transformer Models [0.0]
This work proposes a 3 Phase technique to adjust a base model for a classification task.
We adapt the model's signal to the data distribution by performing further training with a Denoising Autoencoder (DAE)
In addition, we introduce a new data augmentation approach for Supervised Contrastive Learning to correct the unbalanced datasets.
arXiv Detail & Related papers (2024-05-23T11:08:35Z) - Efficient Transferability Assessment for Selection of Pre-trained Detectors [63.21514888618542]
This paper studies the efficient transferability assessment of pre-trained object detectors.
We build up a detector transferability benchmark which contains a large and diverse zoo of pre-trained detectors.
Experimental results demonstrate that our method outperforms other state-of-the-art approaches in assessing transferability.
arXiv Detail & Related papers (2024-03-14T14:23:23Z) - Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning.
Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation.
Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z) - Efficient Adaptation of Large Vision Transformer via Adapter
Re-Composing [8.88477151877883]
High-capacity pre-trained models have revolutionized problem-solving in computer vision.
We propose a novel Adapter Re-Composing (ARC) strategy that addresses efficient pre-trained model adaptation.
Our approach considers the reusability of adaptation parameters and introduces a parameter-sharing scheme.
arXiv Detail & Related papers (2023-10-10T01:04:15Z) - Emergent Agentic Transformer from Chain of Hindsight Experience [96.56164427726203]
We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
arXiv Detail & Related papers (2023-05-26T00:43:02Z) - Hyper-Representations as Generative Models: Sampling Unseen Neural
Network Weights [2.9678808525128813]
We extend hyper-representations for generative use to sample new model weights.
Our results indicate the potential of knowledge aggregation from model zoos to new models via hyper-representations.
arXiv Detail & Related papers (2022-09-29T12:53:58Z) - Hyper-Representations for Pre-Training and Transfer Learning [2.9678808525128813]
We extend hyper-representations for generative use to sample new model weights as pre-training.
Our results indicate the potential of knowledge aggregation from model zoos to new models via hyper-representations.
arXiv Detail & Related papers (2022-07-22T09:01:21Z) - Beyond Transfer Learning: Co-finetuning for Action Localisation [64.07196901012153]
We propose co-finetuning -- simultaneously training a single model on multiple upstream'' and downstream'' tasks.
We demonstrate that co-finetuning outperforms traditional transfer learning when using the same total amount of data.
We also show how we can easily extend our approach to multiple upstream'' datasets to further improve performance.
arXiv Detail & Related papers (2022-07-08T10:25:47Z) - Merging Models with Fisher-Weighted Averaging [24.698591753644077]
We introduce a fundamentally different method for transferring knowledge across models that amounts to "merging" multiple models into one.
Our approach effectively involves computing a weighted average of the models' parameters.
We show that our merging procedure makes it possible to combine models in previously unexplored ways.
arXiv Detail & Related papers (2021-11-18T17:59:35Z) - Parameter-Efficient Transfer from Sequential Behaviors for User Modeling
and Recommendation [111.44445634272235]
In this paper, we develop a parameter efficient transfer learning architecture, termed as PeterRec.
PeterRec allows the pre-trained parameters to remain unaltered during fine-tuning by injecting a series of re-learned neural networks.
We perform extensive experimental ablation to show the effectiveness of the learned user representation in five downstream tasks.
arXiv Detail & Related papers (2020-01-13T14:09:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.