Related papers: Ranking and Tuning Pre-trained Models: A New Paradigm of Exploiting Model Hubs

Ranking and Tuning Pre-trained Models: A New Paradigm of Exploiting Model Hubs

URL: http://arxiv.org/abs/2110.10545v1
Date: Wed, 20 Oct 2021 12:59:23 GMT
Title: Ranking and Tuning Pre-trained Models: A New Paradigm of Exploiting Model Hubs
Authors: Kaichao You, Yong Liu, Jianmin Wang, Michael I. Jordan, Mingsheng Long
Abstract summary: We propose a new paradigm of exploiting model hubs by ranking and tuning pre-trained models. The best ranked PTM can be fine-tuned and deployed if we have no preference for the model's architecture. The tuning part introduces a novel method for multiple PTMs tuning, which surpasses dedicated methods.
Score: 136.4492678691406
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Pre-trained model hubs with many pre-trained models (PTMs) have been a cornerstone in deep learning. Although built at a high cost, they are in fact \emph{under-exploited}: practitioners usually pick one PTM from the provided model hub by popularity, and then fine-tune the PTM to solve the target task. This na\"ve but common practice poses two obstacles to sufficiently exploiting pre-trained model hubs: (1) the PTM selection procedure has no optimality guarantee; (2) only one PTM is used while the rest PTMs are overlooked. Ideally, to maximally exploit pre-trained model hubs, trying all combinations of PTMs and extensively fine-tuning each combination of PTMs are required, which incurs exponential combinations and unaffordable computational budget. In this paper, we propose a new paradigm of exploiting model hubs by ranking and tuning pre-trained models: (1) Our conference work~\citep{you_logme:_2021} proposed LogME to estimate the maximum value of label evidence given features extracted by pre-trained models, which can rank all the PTMs in a model hub for various types of PTMs and tasks \emph{before fine-tuning}. (2) the best ranked PTM can be fine-tuned and deployed if we have no preference for the model's architecture, or the target PTM can be tuned by top-K ranked PTMs via the proposed B-Tuning algorithm. The ranking part is based on the conference paper, and we complete its theoretical analysis (convergence proof of the heuristic evidence maximization procedure, and the influence of feature dimension) in this paper. The tuning part introduces a novel Bayesian Tuning (B-Tuning) method for multiple PTMs tuning, which surpasses dedicated methods designed for homogeneous PTMs tuning and sets up new state of the art for heterogeneous PTMs tuning. We believe the new paradigm of exploiting PTM hubs can interest a large audience of the community.

Related papers

Integrating Dual Prototypes for Task-Wise Adaption in Pre-Trained Model-Based Class-Incremental Learning [12.451582222211833]
Class-incremental learning (CIL) aims to acquire new classes while conserving historical knowledge incrementally. This paper proposes the Dual Prototype network for Task-wise Adaption (DPTA) of PTM-based CIL.
arXiv Detail & Related papers (2024-11-26T05:04:38Z)
EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods. EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z)
Rethinking Class-incremental Learning in the Era of Large Pre-trained Models via Test-Time Adaptation [20.62749699589017]
Class-incremental learning (CIL) is a challenging task that involves sequentially learning to categorize classes from new tasks. We propose Test-Time Adaptation for Class-Incremental Learning (TTACIL) that first fine-tunes PTMs using Adapters on the first task. Our TTACIL does not undergo any forgetting, while benefiting each task with the rich PTM features.
arXiv Detail & Related papers (2023-10-17T13:06:39Z)
Efficient Federated Prompt Tuning for Black-box Large Pre-trained Models [62.838689691468666]
We propose Federated Black-Box Prompt Tuning (Fed-BBPT) to optimally harness each local dataset. Fed-BBPT capitalizes on a central server that aids local users in collaboratively training a prompt generator through regular aggregation. Relative to extensive fine-tuning, Fed-BBPT proficiently sidesteps memory challenges tied to PTM storage and fine-tuning on local machines.
arXiv Detail & Related papers (2023-10-04T19:30:49Z)
Model Spider: Learning to Rank Pre-Trained Models Efficiently [42.56392378060269]
Model Spider learns to construct tokens and measure the fitness score between a model-task pair via their tokens. Model Spider balances efficiency and selection ability, making PTM selection like a spider preying on a web.
arXiv Detail & Related papers (2023-06-06T17:58:12Z)
Revisiting Class-Incremental Learning with Pre-Trained Models: Generalizability and Adaptivity are All You Need [84.3507610522086]
Class-incremental learning (CIL) aims to adapt to emerging new classes without forgetting old ones. Recent pre-training has achieved substantial progress, making vast pre-trained models (PTMs) accessible for CIL. We argue that the core factors in CIL are adaptivity for model updating and generalizability for knowledge transferring.
arXiv Detail & Related papers (2023-03-13T17:59:02Z)
ZooD: Exploiting Model Zoo for Out-of-Distribution Generalization [65.58562481279023]
We propose ZooD, a paradigm for PTMs ranking and ensemble with feature selection. We evaluate our paradigm on a diverse model zoo consisting of 35 models for various Out-of-Distribution (OoD) tasks.
arXiv Detail & Related papers (2022-10-17T16:31:57Z)
DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models [152.29364079385635]
As pre-trained models grow bigger, the fine-tuning process can be time-consuming and computationally expensive. We propose a framework for resource- and parameter-efficient fine-tuning by leveraging the sparsity prior in both weight updates and the final model weights. Our proposed framework, dubbed Dually Sparsity-Embedded Efficient Tuning (DSEE), aims to achieve two key objectives: (i) parameter efficient fine-tuning and (ii) resource-efficient inference.
arXiv Detail & Related papers (2021-10-30T03:29:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.