Scalable Transfer Learning with Expert Models
- URL: http://arxiv.org/abs/2009.13239v1
- Date: Mon, 28 Sep 2020 12:07:10 GMT
- Title: Scalable Transfer Learning with Expert Models
- Authors: Joan Puigcerver, Carlos Riquelme, Basil Mustafa, Cedric Renggli,
Andr\'e Susano Pinto, Sylvain Gelly, Daniel Keysers, Neil Houlsby
- Abstract summary: We explore the use of expert representations for transfer with a simple, yet effective, strategy.
We train a diverse set of experts by exploiting existing label structures, and use cheap-to-compute performance proxies to select the relevant expert for each target task.
This strategy scales the process of transferring to new tasks, since it does not revisit the pre-training data during transfer.
- Score: 32.48351077884257
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transfer of pre-trained representations can improve sample efficiency and
reduce computational requirements for new tasks. However, representations used
for transfer are usually generic, and are not tailored to a particular
distribution of downstream tasks. We explore the use of expert representations
for transfer with a simple, yet effective, strategy. We train a diverse set of
experts by exploiting existing label structures, and use cheap-to-compute
performance proxies to select the relevant expert for each target task. This
strategy scales the process of transferring to new tasks, since it does not
revisit the pre-training data during transfer. Accordingly, it requires little
extra compute per target task, and results in a speed-up of 2-3 orders of
magnitude compared to competing approaches. Further, we provide an
adapter-based architecture able to compress many experts into a single model.
We evaluate our approach on two different data sources and demonstrate that it
outperforms baselines on over 20 diverse vision tasks in both cases.
Related papers
- Transfer Learning for Structured Pruning under Limited Task Data [15.946734013984184]
We propose a framework which combines structured pruning with transfer learning to reduce the need for task-specific data.
We demonstrate that our framework results in pruned models with improved generalization over strong baselines.
arXiv Detail & Related papers (2023-11-10T20:23:35Z) - Optimal transfer protocol by incremental layer defrosting [66.76153955485584]
Transfer learning is a powerful tool enabling model training with limited amounts of data.
The simplest transfer learning protocol is based on freezing" the feature-extractor layers of a network pre-trained on a data-rich source task.
We show that this protocol is often sub-optimal and the largest performance gain may be achieved when smaller portions of the pre-trained network are kept frozen.
arXiv Detail & Related papers (2023-03-02T17:32:11Z) - The Trade-off between Universality and Label Efficiency of
Representations from Contrastive Learning [32.15608637930748]
We show that there exists a trade-off between the two desiderata so that one may not be able to achieve both simultaneously.
We provide analysis using a theoretical data model and show that, while more diverse pre-training data result in more diverse features for different tasks, it puts less emphasis on task-specific features.
arXiv Detail & Related papers (2023-02-28T22:14:33Z) - Diversified Dynamic Routing for Vision Tasks [36.199659460868496]
We propose a novel architecture where each layer is composed of a set of experts.
In our method, the model is explicitly trained to solve the challenge of finding relevant partitioning of the data.
We conduct several experiments on semantic segmentation on Cityscapes and object detection and instance segmentation on MS-COCO.
arXiv Detail & Related papers (2022-09-26T23:27:51Z) - Conditional Meta-Learning of Linear Representations [57.90025697492041]
Standard meta-learning for representation learning aims to find a common representation to be shared across multiple tasks.
In this work we overcome this issue by inferring a conditioning function, mapping the tasks' side information into a representation tailored to the task at hand.
We propose a meta-algorithm capable of leveraging this advantage in practice.
arXiv Detail & Related papers (2021-03-30T12:02:14Z) - Towards Accurate Knowledge Transfer via Target-awareness Representation
Disentanglement [56.40587594647692]
We propose a novel transfer learning algorithm, introducing the idea of Target-awareness REpresentation Disentanglement (TRED)
TRED disentangles the relevant knowledge with respect to the target task from the original source model and used as a regularizer during fine-tuning the target model.
Experiments on various real world datasets show that our method stably improves the standard fine-tuning by more than 2% in average.
arXiv Detail & Related papers (2020-10-16T17:45:08Z) - CrossTransformers: spatially-aware few-shot transfer [92.33252608837947]
Given new tasks with very little data, modern vision systems degrade remarkably quickly.
We show how the neural network representations which underpin modern vision systems are subject to supervision collapse.
We propose self-supervised learning to encourage general-purpose features that transfer better.
arXiv Detail & Related papers (2020-07-22T15:37:08Z) - Representation Transfer by Optimal Transport [34.77292648424614]
We use optimal transport to quantify the match between two representations.
This distance defines a regularizer promoting the similarity of the student's representation with that of the teacher.
arXiv Detail & Related papers (2020-07-13T23:42:06Z) - MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and
Architectures [61.73533544385352]
We propose a transferable perturbation, MetaPerturb, which is meta-learned to improve generalization performance on unseen data.
As MetaPerturb is a set-function trained over diverse distributions across layers and tasks, it can generalize heterogeneous tasks and architectures.
arXiv Detail & Related papers (2020-06-13T02:54:59Z) - Parameter-Efficient Transfer from Sequential Behaviors for User Modeling
and Recommendation [111.44445634272235]
In this paper, we develop a parameter efficient transfer learning architecture, termed as PeterRec.
PeterRec allows the pre-trained parameters to remain unaltered during fine-tuning by injecting a series of re-learned neural networks.
We perform extensive experimental ablation to show the effectiveness of the learned user representation in five downstream tasks.
arXiv Detail & Related papers (2020-01-13T14:09:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.