Related papers: Scalable Diverse Model Selection for Accessible Transfer Learning

Scalable Diverse Model Selection for Accessible Transfer Learning

URL: http://arxiv.org/abs/2111.06977v1
Date: Fri, 12 Nov 2021 22:53:28 GMT
Title: Scalable Diverse Model Selection for Accessible Transfer Learning
Authors: Daniel Bolya, Rohit Mittapalli, Judy Hoffman
Abstract summary: We find that existing model selection and transferability estimation methods perform poorly here. We introduce simple techniques to improve the performance and speed of these algorithms. We create PARC, which outperforms all other methods on diverse model selection.
Score: 21.194070453269592
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the preponderance of pretrained deep learning models available off-the-shelf from model banks today, finding the best weights to fine-tune to your use-case can be a daunting task. Several methods have recently been proposed to find good models for transfer learning, but they either don't scale well to large model banks or don't perform well on the diversity of off-the-shelf models. Ideally the question we want to answer is, "given some data and a source model, can you quickly predict the model's accuracy after fine-tuning?" In this paper, we formalize this setting as "Scalable Diverse Model Selection" and propose several benchmarks for evaluating on this task. We find that existing model selection and transferability estimation methods perform poorly here and analyze why this is the case. We then introduce simple techniques to improve the performance and speed of these algorithms. Finally, we iterate on existing methods to create PARC, which outperforms all other methods on diverse model selection. We have released the benchmarks and method code in hope to inspire future work in model selection for accessible transfer learning.

Related papers

Towards Fundamentally Scalable Model Selection: Asymptotically Fast Update and Selection [40.85209520973634]
An ideal model selection scheme should support two operations efficiently over a large pool of candidate models. Previous solutions to model selection require high computational complexity for at least one of these two operations. We present Standardized Embedder, an empirical realization of isolated model embedding.
arXiv Detail & Related papers (2024-06-11T17:57:49Z)
A Two-Phase Recall-and-Select Framework for Fast Model Selection [13.385915962994806]
We propose a two-phase (coarse-recall and fine-selection) model selection framework. It aims to enhance the efficiency of selecting a robust model by leveraging the models' training performances on benchmark datasets. It has been demonstrated that the proposed methodology facilitates the selection of a high-performing model at a rate about 3x times faster than conventional baseline methods.
arXiv Detail & Related papers (2024-03-28T14:44:44Z)
TRAK: Attributing Model Behavior at Scale [79.56020040993947]
We present TRAK (Tracing with Randomly-trained After Kernel), a data attribution method that is both effective and computationally tractable for large-scale, differenti models.
arXiv Detail & Related papers (2023-03-24T17:56:22Z)
Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models. This creates a barrier to fusing knowledge across individual models to yield a better single model. We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z)
Investigating Ensemble Methods for Model Robustness Improvement of Text Classifiers [66.36045164286854]
We analyze a set of existing bias features and demonstrate there is no single model that works best for all the cases. By choosing an appropriate bias model, we can obtain a better robustness result than baselines with a more sophisticated model design.
arXiv Detail & Related papers (2022-10-28T17:52:10Z)
Optimizing Active Learning for Low Annotation Budgets [6.753808772846254]
In deep learning, active learning is usually implemented as an iterative process in which successive deep models are updated via fine tuning. We tackle this issue by using an approach inspired by transfer learning. We introduce a novel acquisition function which exploits the iterative nature of AL process to select samples in a more robust fashion.
arXiv Detail & Related papers (2022-01-18T18:53:10Z)
Few-Shot Lifelong Learning [35.05196800623617]
Few-Shot Lifelong Learning enables deep learning models to perform lifelong/continual learning on few-shot data. Our method selects very few parameters from the model for training every new set of classes instead of training the full model. We experimentally show that our method significantly outperforms existing methods on the miniImageNet, CIFAR-100, and CUB-200 datasets.
arXiv Detail & Related papers (2021-03-01T13:26:57Z)
A linearized framework and a new benchmark for model selection for fine-tuning [112.20527122513668]
Fine-tuning from a collection of models pre-trained on different domains is emerging as a technique to improve test accuracy in the low-data regime. We introduce two new baselines for model selection -- Label-Gradient and Label-Feature Correlation. Our benchmark highlights accuracy gain with model zoo compared to fine-tuning Imagenet models.
arXiv Detail & Related papers (2021-01-29T21:57:15Z)
Reinforced Multi-Teacher Selection for Knowledge Distillation [54.72886763796232]
knowledge distillation is a popular method for model compression. Current methods assign a fixed weight to a teacher model in the whole distillation. Most of the existing methods allocate an equal weight to every teacher model. In this paper, we observe that, due to the complexity of training examples and the differences in student model capability, learning differentially from teacher models can lead to better performance of student models distilled.
arXiv Detail & Related papers (2020-12-11T08:56:39Z)
What do we expect from Multiple-choice QA Systems? [70.86513724662302]
We consider a top performing model on several Multiple Choice Question Answering (MCQA) datasets. We evaluate it against a set of expectations one might have from such a model, using a series of zero-information perturbations of the model's inputs.
arXiv Detail & Related papers (2020-11-20T21:27:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.