Not All Models Are Equal: Predicting Model Transferability in a
Self-challenging Fisher Space
- URL: http://arxiv.org/abs/2207.03036v1
- Date: Thu, 7 Jul 2022 01:33:25 GMT
- Title: Not All Models Are Equal: Predicting Model Transferability in a
Self-challenging Fisher Space
- Authors: Wenqi Shao, Xun Zhao, Yixiao Ge, Zhaoyang Zhang, Lei Yang, Xiaogang
Wang, Ying Shan, Ping Luo
- Abstract summary: This paper addresses the problem of ranking the pre-trained deep neural networks and screening the most transferable ones for downstream tasks.
It proposes a new transferability metric called textbfSelf-challenging textbfFisher textbfDiscriminant textbfAnalysis (textbfSFDA)
- Score: 51.62131362670815
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper addresses an important problem of ranking the pre-trained deep
neural networks and screening the most transferable ones for downstream tasks.
It is challenging because the ground-truth model ranking for each task can only
be generated by fine-tuning the pre-trained models on the target dataset, which
is brute-force and computationally expensive. Recent advanced methods proposed
several lightweight transferability metrics to predict the fine-tuning results.
However, these approaches only capture static representations but neglect the
fine-tuning dynamics. To this end, this paper proposes a new transferability
metric, called \textbf{S}elf-challenging \textbf{F}isher \textbf{D}iscriminant
\textbf{A}nalysis (\textbf{SFDA}), which has many appealing benefits that
existing works do not have. First, SFDA can embed the static features into a
Fisher space and refine them for better separability between classes. Second,
SFDA uses a self-challenging mechanism to encourage different pre-trained
models to differentiate on hard examples. Third, SFDA can easily select
multiple pre-trained models for the model ensemble. Extensive experiments on
$33$ pre-trained models of $11$ downstream tasks show that SFDA is efficient,
effective, and robust when measuring the transferability of pre-trained models.
For instance, compared with the state-of-the-art method NLEEP, SFDA
demonstrates an average of $59.1$\% gain while bringing $22.5$x speedup in
wall-clock time. The code will be available at
\url{https://github.com/TencentARC/SFDA}.
Related papers
- FINE: Factorizing Knowledge for Initialization of Variable-sized Diffusion Models [35.40065954148091]
FINE is a method based on the Learngene framework to initializing downstream networks leveraging pre-trained models.
It decomposes pre-trained knowledge into the product of matrices (i.e., $U$, $Sigma$, and $V$), where $U$ and $V$ are shared across network blocks as learngenes''
It consistently outperforms direct pre-training, particularly for smaller models, achieving state-of-the-art results across variable model sizes.
arXiv Detail & Related papers (2024-09-28T08:57:17Z) - Ask Your Distribution Shift if Pre-Training is Right for You [74.18516460467019]
In practice, fine-tuning a pre-trained model improves robustness significantly in some cases but not at all in others.
We focus on two possible failure modes of models under distribution shift: poor extrapolation and biases in the training data.
Our study suggests that, as a rule of thumb, pre-training can help mitigate poor extrapolation but not dataset biases.
arXiv Detail & Related papers (2024-02-29T23:46:28Z) - Initialization Matters for Adversarial Transfer Learning [61.89451332757625]
We discover the necessity of an adversarially robust pretrained model.
We propose Robust Linear Initialization (RoLI) for adversarial finetuning, which initializes the linear head with the weights obtained by adversarial linear probing.
Across five different image classification datasets, we demonstrate the effectiveness of RoLI and achieve new state-of-the-art results.
arXiv Detail & Related papers (2023-12-10T00:51:05Z) - RanPAC: Random Projections and Pre-trained Models for Continual Learning [59.07316955610658]
Continual learning (CL) aims to learn different tasks (such as classification) in a non-stationary data stream without forgetting old ones.
We propose a concise and effective approach for CL with pre-trained models.
arXiv Detail & Related papers (2023-07-05T12:49:02Z) - Two Independent Teachers are Better Role Model [7.001845833295753]
We propose a new deep learning model called 3D-DenseUNet.
It works as adaptable global aggregation blocks in down-sampling to solve the issue of spatial information loss.
We also propose a new method called Two Independent Teachers, that summarizes the model weights instead of label predictions.
arXiv Detail & Related papers (2023-06-09T08:22:41Z) - Finding the SWEET Spot: Analysis and Improvement of Adaptive Inference
in Low Resource Settings [6.463202903076821]
We compare the two main approaches for adaptive inference, Early-Exit and Multi-Model, when training data is limited.
Early-Exit provides a better speed-accuracy trade-off due to the overhead of the Multi-Model approach.
We propose SWEET, an Early-Exit fine-tuning method that assigns each classifier its own set of unique model weights.
arXiv Detail & Related papers (2023-06-04T09:16:39Z) - $\Delta$-Patching: A Framework for Rapid Adaptation of Pre-trained
Convolutional Networks without Base Performance Loss [71.46601663956521]
Models pre-trained on large-scale datasets are often fine-tuned to support newer tasks and datasets that arrive over time.
We propose $Delta$-Patching for fine-tuning neural network models in an efficient manner, without the need to store model copies.
Our experiments show that $Delta$-Networks outperform earlier model patching work while only requiring a fraction of parameters to be trained.
arXiv Detail & Related papers (2023-03-26T16:39:44Z) - Cross-Modal Adapter for Text-Video Retrieval [91.9575196703281]
We present a novel $textbfCross-Modal Adapter$ for parameter-efficient fine-tuning.
Inspired by adapter-based methods, we adjust the pre-trained model with a few parameterization layers.
It achieves superior or comparable performance compared to fully fine-tuned methods on MSR-VTT, MSVD, VATEX, ActivityNet, and DiDeMo datasets.
arXiv Detail & Related papers (2022-11-17T16:15:30Z) - Deep Ensembles for Low-Data Transfer Learning [21.578470914935938]
We study different ways of creating ensembles from pre-trained models.
We show that the nature of pre-training itself is a performant source of diversity.
We propose a practical algorithm that efficiently identifies a subset of pre-trained models for any downstream dataset.
arXiv Detail & Related papers (2020-10-14T07:59:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.