Related papers: Zoo-Tuning: Adaptive Transfer from a Zoo of Models

Zoo-Tuning: Adaptive Transfer from a Zoo of Models

URL: http://arxiv.org/abs/2106.15434v1
Date: Tue, 29 Jun 2021 14:09:45 GMT
Title: Zoo-Tuning: Adaptive Transfer from a Zoo of Models
Authors: Yang Shu, Zhi Kou, Zhangjie Cao, Jianmin Wang, Mingsheng Long
Abstract summary: Zoo-Tuning learns to adaptively transfer the parameters of pretrained models to the target task. We evaluate our approach on a variety of tasks, including reinforcement learning, image classification, and facial landmark detection.
Score: 82.9120546160422
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the development of deep networks on various large-scale datasets, a large zoo of pretrained models are available. When transferring from a model zoo, applying classic single-model based transfer learning methods to each source model suffers from high computational burden and cannot fully utilize the rich knowledge in the zoo. We propose \emph{Zoo-Tuning} to address these challenges, which learns to adaptively transfer the parameters of pretrained models to the target task. With the learnable channel alignment layer and adaptive aggregation layer, Zoo-Tuning \emph{adaptively aggregates channel aligned pretrained parameters} to derive the target model, which promotes knowledge transfer by simultaneously adapting multiple source models to downstream tasks. The adaptive aggregation substantially reduces the computation cost at both training and inference. We further propose lite Zoo-Tuning with the temporal ensemble of batch average gating values to reduce the storage cost at the inference time. We evaluate our approach on a variety of tasks, including reinforcement learning, image classification, and facial landmark detection. Experiment results demonstrate that the proposed adaptive transfer learning approach can transfer knowledge from a zoo of models more effectively and efficiently.

Related papers

A Model Zoo of Vision Transformers [6.926413609535758]
We introduce the first model zoo of vision transformers (ViT) To better represent recent training approaches, we develop a new blueprint for model zoo generation that encompasses both pre-training and fine-tuning steps. They are carefully generated with a large span of generating factors, and their diversity is validated using a thorough choice of weight-space and behavioral metrics.
arXiv Detail & Related papers (2025-04-14T13:52:26Z)
The Impact of Model Zoo Size and Composition on Weight Space Learning [8.11780615053558]
Re-using trained neural network models is a common strategy to reduce training cost and transfer knowledge. Weight space learning is a promising new field to re-use populations of pre-trained models for future tasks. We propose a modification to a common weight space learning method to accommodate training on heterogeneous populations of models.
arXiv Detail & Related papers (2025-04-14T11:54:06Z)
Combining Denoising Autoencoders with Contrastive Learning to fine-tune Transformer Models [0.0]
This work proposes a 3 Phase technique to adjust a base model for a classification task. We adapt the model's signal to the data distribution by performing further training with a Denoising Autoencoder (DAE) In addition, we introduce a new data augmentation approach for Supervised Contrastive Learning to correct the unbalanced datasets.
arXiv Detail & Related papers (2024-05-23T11:08:35Z)
Efficient Transferability Assessment for Selection of Pre-trained Detectors [63.21514888618542]
This paper studies the efficient transferability assessment of pre-trained object detectors. We build up a detector transferability benchmark which contains a large and diverse zoo of pre-trained detectors. Experimental results demonstrate that our method outperforms other state-of-the-art approaches in assessing transferability.
arXiv Detail & Related papers (2024-03-14T14:23:23Z)
Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning. Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation. Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z)
Efficient Adaptation of Large Vision Transformer via Adapter Re-Composing [8.88477151877883]
High-capacity pre-trained models have revolutionized problem-solving in computer vision. We propose a novel Adapter Re-Composing (ARC) strategy that addresses efficient pre-trained model adaptation. Our approach considers the reusability of adaptation parameters and introduces a parameter-sharing scheme.
arXiv Detail & Related papers (2023-10-10T01:04:15Z)
Emergent Agentic Transformer from Chain of Hindsight Experience [96.56164427726203]
We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches. This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
arXiv Detail & Related papers (2023-05-26T00:43:02Z)
Hyper-Representations as Generative Models: Sampling Unseen Neural Network Weights [2.9678808525128813]
We extend hyper-representations for generative use to sample new model weights. Our results indicate the potential of knowledge aggregation from model zoos to new models via hyper-representations.
arXiv Detail & Related papers (2022-09-29T12:53:58Z)
Hyper-Representations for Pre-Training and Transfer Learning [2.9678808525128813]
We extend hyper-representations for generative use to sample new model weights as pre-training. Our results indicate the potential of knowledge aggregation from model zoos to new models via hyper-representations.
arXiv Detail & Related papers (2022-07-22T09:01:21Z)
Beyond Transfer Learning: Co-finetuning for Action Localisation [64.07196901012153]
We propose co-finetuning -- simultaneously training a single model on multiple upstream'' and downstream'' tasks. We demonstrate that co-finetuning outperforms traditional transfer learning when using the same total amount of data. We also show how we can easily extend our approach to multiple upstream'' datasets to further improve performance.
arXiv Detail & Related papers (2022-07-08T10:25:47Z)
Merging Models with Fisher-Weighted Averaging [24.698591753644077]
We introduce a fundamentally different method for transferring knowledge across models that amounts to "merging" multiple models into one. Our approach effectively involves computing a weighted average of the models' parameters. We show that our merging procedure makes it possible to combine models in previously unexplored ways.
arXiv Detail & Related papers (2021-11-18T17:59:35Z)
Parameter-Efficient Transfer from Sequential Behaviors for User Modeling and Recommendation [111.44445634272235]
In this paper, we develop a parameter efficient transfer learning architecture, termed as PeterRec. PeterRec allows the pre-trained parameters to remain unaltered during fine-tuning by injecting a series of re-learned neural networks. We perform extensive experimental ablation to show the effectiveness of the learned user representation in five downstream tasks.
arXiv Detail & Related papers (2020-01-13T14:09:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.