Towards Optimal Adapter Placement for Efficient Transfer Learning
- URL: http://arxiv.org/abs/2410.15858v1
- Date: Mon, 21 Oct 2024 10:37:17 GMT
- Title: Towards Optimal Adapter Placement for Efficient Transfer Learning
- Authors: Aleksandra I. Nowak, Otniel-Bogdan Mercea, Anurag Arnab, Jonas Pfeiffer, Yann Dauphin, Utku Evci,
- Abstract summary: PETL aims to adapt pre-trained models to new downstream tasks while minimizing the number of fine-tuned parameters.
adapters, a popular approach in PETL, inject additional capacity into existing networks by incorporating low-rank projections.
This paper investigates the relationship between the placement of an adapter and its performance.
- Score: 73.1149084352343
- License:
- Abstract: Parameter-efficient transfer learning (PETL) aims to adapt pre-trained models to new downstream tasks while minimizing the number of fine-tuned parameters. Adapters, a popular approach in PETL, inject additional capacity into existing networks by incorporating low-rank projections, achieving performance comparable to full fine-tuning with significantly fewer parameters. This paper investigates the relationship between the placement of an adapter and its performance. We observe that adapter location within a network significantly impacts its effectiveness, and that the optimal placement is task-dependent. To exploit this observation, we introduce an extended search space of adapter connections, including long-range and recurrent adapters. We demonstrate that even randomly selected adapter placements from this expanded space yield improved results, and that high-performing placements often correlate with high gradient rank. Our findings reveal that a small number of strategically placed adapters can match or exceed the performance of the common baseline of adding adapters in every block, opening a new avenue for research into optimal adapter placement strategies.
Related papers
- Pear: Pruning and Sharing Adapters in Visual Parameter-Efficient Fine-Tuning [6.068296063531189]
adapters can exhibit redundancy, leading to unnecessary storage overhead and inferior performance.
We propose Prune and Share (Pear), a novel adapter-pruning framework for efficient fine-tuning of pretrained visual foundation models.
arXiv Detail & Related papers (2024-09-29T15:18:38Z) - Parameter-Efficient Fine-Tuning With Adapters [5.948206235442328]
This research introduces a novel adaptation method utilizing the UniPELT framework as a base.
Our method employs adapters, which enable efficient transfer of pretrained models to new tasks with minimal retraining of the base model parameters.
arXiv Detail & Related papers (2024-05-09T01:40:38Z) - Efficient Adaptation of Large Vision Transformer via Adapter
Re-Composing [8.88477151877883]
High-capacity pre-trained models have revolutionized problem-solving in computer vision.
We propose a novel Adapter Re-Composing (ARC) strategy that addresses efficient pre-trained model adaptation.
Our approach considers the reusability of adaptation parameters and introduces a parameter-sharing scheme.
arXiv Detail & Related papers (2023-10-10T01:04:15Z) - MerA: Merging Pretrained Adapters For Few-Shot Learning [71.44422347502409]
We propose textbftextttMerging Pretrained Adapters (MerA) that efficiently incorporates pretrained adapters to a single model through model fusion.
Experiments on two PLMs demonstrate that MerA substantial improvements compared to both single adapters and AdapterFusion.
arXiv Detail & Related papers (2023-08-30T12:10:17Z) - Tiny-Attention Adapter: Contexts Are More Important Than the Number of
Parameters [25.958600375299735]
Adapter-tuning is a paradigm that transfers a pretrained language model to downstream tasks by adding and tuning a small number of new parameters.
In this paper, we investigate the effectiveness of using tiny-attention -- i.e., attention with extremely small per-head dimensionality -- as adapters.
Our tiny-attention adapter learns to modify the hidden states at each position directly conditioned on the hidden states at all the other positions.
arXiv Detail & Related papers (2022-10-18T15:20:44Z) - SparseAdapter: An Easy Approach for Improving the Parameter-Efficiency
of Adapters [96.52807311742198]
We re-examine the parameter-efficiency of Adapters through the lens of network pruning.
We find that SparseAdapter can achieve comparable or better performance than standard Adapters when the sparse ratio reaches up to 80%.
arXiv Detail & Related papers (2022-10-09T15:28:48Z) - Adaptable Adapters [74.65986170056945]
State-of-the-art pretrained NLP models contain a hundred million to trillion parameters.
Adaptable adapters contain different activation functions for different layers and different input data.
We show that adaptable adapters achieve on-par performances with the standard adapter architecture.
arXiv Detail & Related papers (2022-05-03T14:59:27Z) - AdapterBias: Parameter-efficient Token-dependent Representation Shift
for Adapters in NLP Tasks [55.705355299065474]
Transformer-based pre-trained models with millions of parameters require large storage.
Recent approaches tackle this shortcoming by training adapters, but these approaches still require a relatively large number of parameters.
In this study, AdapterBias, a surprisingly simple yet effective adapter architecture, is proposed.
arXiv Detail & Related papers (2022-04-30T16:49:41Z) - AdapterDrop: On the Efficiency of Adapters in Transformers [53.845909603631945]
Massively pre-trained transformer models are computationally expensive to fine-tune, slow for inference, and have large storage requirements.
Recent approaches tackle these shortcomings by training smaller models, dynamically reducing the model size, and by training light-weight adapters.
arXiv Detail & Related papers (2020-10-22T17:49:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.