Related papers: Which Model to Transfer? Finding the Needle in the Growing Haystack

Which Model to Transfer? Finding the Needle in the Growing Haystack

URL: http://arxiv.org/abs/2010.06402v2
Date: Fri, 25 Mar 2022 08:27:57 GMT
Title: Which Model to Transfer? Finding the Needle in the Growing Haystack
Authors: Cedric Renggli, Andr\'e Susano Pinto, Luka Rimanic, Joan Puigcerver, Carlos Riquelme, Ce Zhang, Mario Lucic
Abstract summary: We provide a formalization of this problem through a familiar notion of regret. We show that both task-agnostic and task-aware methods can yield high regret. We then propose a simple and efficient hybrid search strategy which outperforms the existing approaches.
Score: 27.660318887140203
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transfer learning has been recently popularized as a data-efficient alternative to training models from scratch, in particular for computer vision tasks where it provides a remarkably solid baseline. The emergence of rich model repositories, such as TensorFlow Hub, enables the practitioners and researchers to unleash the potential of these models across a wide range of downstream tasks. As these repositories keep growing exponentially, efficiently selecting a good model for the task at hand becomes paramount. We provide a formalization of this problem through a familiar notion of regret and introduce the predominant strategies, namely task-agnostic (e.g. ranking models by their ImageNet performance) and task-aware search strategies (such as linear or kNN evaluation). We conduct a large-scale empirical study and show that both task-agnostic and task-aware methods can yield high regret. We then propose a simple and computationally efficient hybrid search strategy which outperforms the existing approaches. We highlight the practical benefits of the proposed solution on a set of 19 diverse vision tasks.

Related papers

EfficientLLaVA:Generalizable Auto-Pruning for Large Vision-language Models [64.18350535770357]
We propose an automatic pruning method for large vision-language models to enhance the efficiency of multimodal reasoning. Our approach only leverages a small number of samples to search for the desired pruning policy. We conduct extensive experiments on the ScienceQA, Vizwiz, MM-vet, and LLaVA-Bench datasets for the task of visual question answering.
arXiv Detail & Related papers (2025-03-19T16:07:04Z)
One-Shot Pruning for Fast-adapting Pre-trained Models on Devices [28.696989086706186]
Large-scale pre-trained models have been remarkably successful in resolving downstream tasks. deploying these models on low-capability devices still requires an effective approach, such as model pruning. We present a scalable one-shot pruning method that leverages pruned knowledge of similar tasks to extract a sub-network from the pre-trained model for a new task.
arXiv Detail & Related papers (2023-07-10T06:44:47Z)
Self-Supervised Reinforcement Learning that Transfers using Random Features [41.00256493388967]
We propose a self-supervised reinforcement learning method that enables the transfer of behaviors across tasks with different rewards. Our method is self-supervised in that it can be trained on offline datasets without reward labels, but can then be quickly deployed on new tasks.
arXiv Detail & Related papers (2023-05-26T20:37:06Z)
Voting from Nearest Tasks: Meta-Vote Pruning of Pre-trained Models for Downstream Tasks [55.431048995662714]
We create a small model for a new task from the pruned models of similar tasks. We show that a few fine-tuning steps on this model suffice to produce a promising pruned-model for the new task. We develop a simple but effective ''Meta-Vote Pruning (MVP)'' method that significantly reduces the pruning iterations for a new task.
arXiv Detail & Related papers (2023-01-27T06:49:47Z)
Prototype-guided Cross-task Knowledge Distillation for Large-scale Models [103.04711721343278]
Cross-task knowledge distillation helps to train a small student model to obtain a competitive performance. We propose a Prototype-guided Cross-task Knowledge Distillation (ProC-KD) approach to transfer the intrinsic local-level object knowledge of a large-scale teacher network to various task scenarios.
arXiv Detail & Related papers (2022-12-26T15:00:42Z)
An Evolutionary Approach to Dynamic Introduction of Tasks in Large-scale Multitask Learning Systems [4.675744559395732]
Multitask learning assumes that models capable of learning from multiple tasks can achieve better quality and efficiency via knowledge transfer. State of the art ML models rely on high customization for each task and leverage size and data scale rather than scaling the number of tasks. We propose an evolutionary method that can generate a large scale multitask model and can support the dynamic and continuous addition of new tasks.
arXiv Detail & Related papers (2022-05-25T13:10:47Z)
SHiFT: An Efficient, Flexible Search Engine for Transfer Learning [16.289623977712086]
Transfer learning can be seen as a data- and compute-efficient alternative to training models from scratch. We propose SHiFT, the first downstream task-aware, flexible, and efficient model search engine for transfer learning.
arXiv Detail & Related papers (2022-04-04T13:16:46Z)
What Makes Good Contrastive Learning on Small-Scale Wearable-based Tasks? [59.51457877578138]
We study contrastive learning on the wearable-based activity recognition task. This paper presents an open-source PyTorch library textttCL-HAR, which can serve as a practical tool for researchers.
arXiv Detail & Related papers (2022-02-12T06:10:15Z)
Multitask Adaptation by Retrospective Exploration with Learned World Models [77.34726150561087]
We propose a meta-learned addressing model called RAMa that provides training samples for the MBRL agent taken from task-agnostic storage. The model is trained to maximize the expected agent's performance by selecting promising trajectories solving prior tasks from the storage.
arXiv Detail & Related papers (2021-10-25T20:02:57Z)
Online reinforcement learning with sparse rewards through an active inference capsule [62.997667081978825]
This paper introduces an active inference agent which minimizes the novel free energy of the expected future. Our model is capable of solving sparse-reward problems with a very high sample efficiency. We also introduce a novel method for approximating the prior model from the reward function, which simplifies the expression of complex objectives.
arXiv Detail & Related papers (2021-06-04T10:03:36Z)
Goal-Aware Prediction: Learning to Model What Matters [105.43098326577434]
One of the fundamental challenges in using a learned forward dynamics model is the mismatch between the objective of the learned model and that of the downstream planner or policy. We propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space. We find that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning.
arXiv Detail & Related papers (2020-07-14T16:42:59Z)
Sequential Transfer in Reinforcement Learning with a Generative Model [48.40219742217783]
We show how to reduce the sample complexity for learning new tasks by transferring knowledge from previously-solved ones. We derive PAC bounds on its sample complexity which clearly demonstrate the benefits of using this kind of prior knowledge. We empirically verify our theoretical findings in simple simulated domains.
arXiv Detail & Related papers (2020-07-01T19:53:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.