Mini but Mighty: Finetuning ViTs with Mini Adapters
- URL: http://arxiv.org/abs/2311.03873v1
- Date: Tue, 7 Nov 2023 10:41:27 GMT
- Title: Mini but Mighty: Finetuning ViTs with Mini Adapters
- Authors: Imad Eddine Marouf, Enzo Tartaglione, St\'ephane Lathuili\`ere
- Abstract summary: adapters perform poorly when the dimension of adapters is small.
We propose MiMi, a training framework that addresses this issue.
Our method outperforms existing methods in finding the best trade-off between accuracy and trained parameters.
- Score: 7.175668563148084
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Vision Transformers (ViTs) have become one of the dominant architectures in
computer vision, and pre-trained ViT models are commonly adapted to new tasks
via fine-tuning. Recent works proposed several parameter-efficient transfer
learning methods, such as adapters, to avoid the prohibitive training and
storage cost of finetuning. In this work, we observe that adapters perform
poorly when the dimension of adapters is small, and we propose MiMi, a training
framework that addresses this issue. We start with large adapters which can
reach high performance, and iteratively reduce their size. To enable automatic
estimation of the hidden dimension of every adapter, we also introduce a new
scoring function, specifically designed for adapters, that compares the neuron
importance across layers. Our method outperforms existing methods in finding
the best trade-off between accuracy and trained parameters across the three
dataset benchmarks DomainNet, VTAB, and Multi-task, for a total of 29 datasets.
Related papers
- MerA: Merging Pretrained Adapters For Few-Shot Learning [71.44422347502409]
We propose textbftextttMerging Pretrained Adapters (MerA) that efficiently incorporates pretrained adapters to a single model through model fusion.
Experiments on two PLMs demonstrate that MerA substantial improvements compared to both single adapters and AdapterFusion.
arXiv Detail & Related papers (2023-08-30T12:10:17Z) - Revisiting the Parameter Efficiency of Adapters from the Perspective of
Precision Redundancy [17.203320079872952]
Current state-of-the-art results in computer vision depend in part on fine-tuning large pre-trained vision models.
With the exponential growth of model sizes, the conventional full fine-tuning leads to increasingly huge storage and transmission overhead.
In this paper, we investigate how to make adapters even more efficient, reaching a new minimum size required to store a task-specific fine-tuned network.
arXiv Detail & Related papers (2023-07-31T17:22:17Z) - Towards Efficient Visual Adaption via Structural Re-parameterization [76.57083043547296]
We propose a parameter-efficient and computational friendly adapter for giant vision models, called RepAdapter.
RepAdapter outperforms full tuning by +7.2% on average and saves up to 25% training time, 20% GPU memory, and 94.6% storage cost of ViT-B/16 on VTAB-1k.
arXiv Detail & Related papers (2023-02-16T06:14:15Z) - Tiny-Attention Adapter: Contexts Are More Important Than the Number of
Parameters [25.958600375299735]
Adapter-tuning is a paradigm that transfers a pretrained language model to downstream tasks by adding and tuning a small number of new parameters.
In this paper, we investigate the effectiveness of using tiny-attention -- i.e., attention with extremely small per-head dimensionality -- as adapters.
Our tiny-attention adapter learns to modify the hidden states at each position directly conditioned on the hidden states at all the other positions.
arXiv Detail & Related papers (2022-10-18T15:20:44Z) - AdaMix: Mixture-of-Adapter for Parameter-efficient Tuning of Large
Language Models [119.7093605087114]
Fine-tuning large-scale pre-trained language models to downstream tasks require updating hundreds of millions of parameters.
This not only increases the serving cost to store a large copy of the model weights for every task, but also exhibits instability during few-shot task adaptation.
We introduce a new mechanism to improve adapter capacity without increasing parameters or computational cost by two key techniques.
arXiv Detail & Related papers (2022-05-24T23:41:22Z) - AdapterBias: Parameter-efficient Token-dependent Representation Shift
for Adapters in NLP Tasks [55.705355299065474]
Transformer-based pre-trained models with millions of parameters require large storage.
Recent approaches tackle this shortcoming by training adapters, but these approaches still require a relatively large number of parameters.
In this study, AdapterBias, a surprisingly simple yet effective adapter architecture, is proposed.
arXiv Detail & Related papers (2022-04-30T16:49:41Z) - VL-Adapter: Parameter-Efficient Transfer Learning for
Vision-and-Language Tasks [71.40656211497162]
Recently, fine-tuning language models pre-trained on large text corpora have provided huge improvements on vision-and-language (V&L) tasks.
We introduce adapter-based parameter-efficient transfer learning techniques to V&L models such as VL-BART and VL-T5.
Our results demonstrate that training the adapter with the weight-sharing technique can match the performance of fine-tuning the entire model.
arXiv Detail & Related papers (2021-12-13T17:35:26Z) - AdapterDrop: On the Efficiency of Adapters in Transformers [53.845909603631945]
Massively pre-trained transformer models are computationally expensive to fine-tune, slow for inference, and have large storage requirements.
Recent approaches tackle these shortcomings by training smaller models, dynamically reducing the model size, and by training light-weight adapters.
arXiv Detail & Related papers (2020-10-22T17:49:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.