Related papers: Scalable Transfer Learning with Expert Models

Scalable Transfer Learning with Expert Models

URL: http://arxiv.org/abs/2009.13239v1
Date: Mon, 28 Sep 2020 12:07:10 GMT
Title: Scalable Transfer Learning with Expert Models
Authors: Joan Puigcerver, Carlos Riquelme, Basil Mustafa, Cedric Renggli, Andr\'e Susano Pinto, Sylvain Gelly, Daniel Keysers, Neil Houlsby
Abstract summary: We explore the use of expert representations for transfer with a simple, yet effective, strategy. We train a diverse set of experts by exploiting existing label structures, and use cheap-to-compute performance proxies to select the relevant expert for each target task. This strategy scales the process of transferring to new tasks, since it does not revisit the pre-training data during transfer.
Score: 32.48351077884257
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transfer of pre-trained representations can improve sample efficiency and reduce computational requirements for new tasks. However, representations used for transfer are usually generic, and are not tailored to a particular distribution of downstream tasks. We explore the use of expert representations for transfer with a simple, yet effective, strategy. We train a diverse set of experts by exploiting existing label structures, and use cheap-to-compute performance proxies to select the relevant expert for each target task. This strategy scales the process of transferring to new tasks, since it does not revisit the pre-training data during transfer. Accordingly, it requires little extra compute per target task, and results in a speed-up of 2-3 orders of magnitude compared to competing approaches. Further, we provide an adapter-based architecture able to compress many experts into a single model. We evaluate our approach on two different data sources and demonstrate that it outperforms baselines on over 20 diverse vision tasks in both cases.

Related papers

BeST -- A Novel Source Selection Metric for Transfer Learning [35.32994166809785]
We develop a novel task-similarity metric (BeST) to identify the most transferrable source(s) for a given task. Our metric can provide significant computational savings for transfer learning from a selection of a large number of possible source models.
arXiv Detail & Related papers (2025-01-19T03:58:05Z)
Do We Need to Design Specific Diffusion Models for Different Tasks? Try ONE-PIC [77.8851460746251]
We propose a simple, efficient, and general approach to fine-tune diffusion models. ONE-PIC enhances the inherited generative ability in the pretrained diffusion models without introducing additional modules. Our method is simple and efficient which streamlines the adaptation process and achieves excellent performance with lower costs.
arXiv Detail & Related papers (2024-12-07T11:19:32Z)
Transfer Learning for Structured Pruning under Limited Task Data [15.946734013984184]
We propose a framework which combines structured pruning with transfer learning to reduce the need for task-specific data. We demonstrate that our framework results in pruned models with improved generalization over strong baselines.
arXiv Detail & Related papers (2023-11-10T20:23:35Z)
Optimal transfer protocol by incremental layer defrosting [66.76153955485584]
Transfer learning is a powerful tool enabling model training with limited amounts of data. The simplest transfer learning protocol is based on freezing" the feature-extractor layers of a network pre-trained on a data-rich source task. We show that this protocol is often sub-optimal and the largest performance gain may be achieved when smaller portions of the pre-trained network are kept frozen.
arXiv Detail & Related papers (2023-03-02T17:32:11Z)
The Trade-off between Universality and Label Efficiency of Representations from Contrastive Learning [32.15608637930748]
We show that there exists a trade-off between the two desiderata so that one may not be able to achieve both simultaneously. We provide analysis using a theoretical data model and show that, while more diverse pre-training data result in more diverse features for different tasks, it puts less emphasis on task-specific features.
arXiv Detail & Related papers (2023-02-28T22:14:33Z)
Diversified Dynamic Routing for Vision Tasks [36.199659460868496]
We propose a novel architecture where each layer is composed of a set of experts. In our method, the model is explicitly trained to solve the challenge of finding relevant partitioning of the data. We conduct several experiments on semantic segmentation on Cityscapes and object detection and instance segmentation on MS-COCO.
arXiv Detail & Related papers (2022-09-26T23:27:51Z)
Conditional Meta-Learning of Linear Representations [57.90025697492041]
Standard meta-learning for representation learning aims to find a common representation to be shared across multiple tasks. In this work we overcome this issue by inferring a conditioning function, mapping the tasks' side information into a representation tailored to the task at hand. We propose a meta-algorithm capable of leveraging this advantage in practice.
arXiv Detail & Related papers (2021-03-30T12:02:14Z)
Towards Accurate Knowledge Transfer via Target-awareness Representation Disentanglement [56.40587594647692]
We propose a novel transfer learning algorithm, introducing the idea of Target-awareness REpresentation Disentanglement (TRED) TRED disentangles the relevant knowledge with respect to the target task from the original source model and used as a regularizer during fine-tuning the target model. Experiments on various real world datasets show that our method stably improves the standard fine-tuning by more than 2% in average.
arXiv Detail & Related papers (2020-10-16T17:45:08Z)
CrossTransformers: spatially-aware few-shot transfer [92.33252608837947]
Given new tasks with very little data, modern vision systems degrade remarkably quickly. We show how the neural network representations which underpin modern vision systems are subject to supervision collapse. We propose self-supervised learning to encourage general-purpose features that transfer better.
arXiv Detail & Related papers (2020-07-22T15:37:08Z)
Representation Transfer by Optimal Transport [34.77292648424614]
We use optimal transport to quantify the match between two representations. This distance defines a regularizer promoting the similarity of the student's representation with that of the teacher.
arXiv Detail & Related papers (2020-07-13T23:42:06Z)
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures [61.73533544385352]
We propose a transferable perturbation, MetaPerturb, which is meta-learned to improve generalization performance on unseen data. As MetaPerturb is a set-function trained over diverse distributions across layers and tasks, it can generalize heterogeneous tasks and architectures.
arXiv Detail & Related papers (2020-06-13T02:54:59Z)
Parameter-Efficient Transfer from Sequential Behaviors for User Modeling and Recommendation [111.44445634272235]
In this paper, we develop a parameter efficient transfer learning architecture, termed as PeterRec. PeterRec allows the pre-trained parameters to remain unaltered during fine-tuning by injecting a series of re-learned neural networks. We perform extensive experimental ablation to show the effectiveness of the learned user representation in five downstream tasks.
arXiv Detail & Related papers (2020-01-13T14:09:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.