Adaptable Adapters
- URL: http://arxiv.org/abs/2205.01549v1
- Date: Tue, 3 May 2022 14:59:27 GMT
- Title: Adaptable Adapters
- Authors: Nafise Sadat Moosavi, Quentin Delfosse, Kristian Kersting, Iryna
Gurevych
- Abstract summary: State-of-the-art pretrained NLP models contain a hundred million to trillion parameters.
Adaptable adapters contain different activation functions for different layers and different input data.
We show that adaptable adapters achieve on-par performances with the standard adapter architecture.
- Score: 74.65986170056945
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: State-of-the-art pretrained NLP models contain a hundred million to trillion
parameters. Adapters provide a parameter-efficient alternative for the full
finetuning in which we can only finetune lightweight neural network layers on
top of pretrained weights. Adapter layers are initialized randomly. However,
existing work uses the same adapter architecture -- i.e., the same adapter
layer on top of each layer of the pretrained model -- for every dataset,
regardless of the properties of the dataset or the amount of available training
data. In this work, we introduce adaptable adapters that contain (1) learning
different activation functions for different layers and different input data,
and (2) a learnable switch to select and only use the beneficial adapter
layers. We show that adaptable adapters achieve on-par performances with the
standard adapter architecture while using a considerably smaller number of
adapter layers. In addition, we show that the selected adapter architecture by
adaptable adapters transfers well across different data settings and similar
tasks. We propose to use adaptable adapters for designing efficient and
effective adapter architectures. The resulting adapters (a) contain about 50%
of the learning parameters of the standard adapter and are therefore more
efficient at training and inference, and require less storage space, and (b)
achieve considerably higher performances in low-data settings.
Related papers
- Adapters Strike Back [10.490880056507198]
We provide an in-depth study of adapters, their internal structure, as well as various implementation choices.
We suggest a concrete, improved adapter architecture, called Adapter+, that not only outperforms previous adapter implementations but surpasses a number of other, more complex adaptation mechanisms in several challenging settings.
arXiv Detail & Related papers (2024-06-10T22:07:57Z) - Stylus: Automatic Adapter Selection for Diffusion Models [81.90482700433822]
We introduce Stylus, which efficiently selects and automatically composes task-specific adapters based on a prompt's keywords.
Stylus outlines a three-stage approach that first summarizes adapters with improved descriptions and embeddings, retrieves relevant adapters, and then further assembles adapters based on prompts' keywords by checking how well they fit the prompt.
arXiv Detail & Related papers (2024-04-29T17:59:16Z) - Hierarchical Recurrent Adapters for Efficient Multi-Task Adaptation of Large Speech Models [12.230087530720652]
We introduce an adapter module that has a better efficiency in large scale multi-task adaptation scenario.
The adapter consists of a single shared controller network and multiple task-level adapter heads.
arXiv Detail & Related papers (2024-03-25T17:21:56Z) - MerA: Merging Pretrained Adapters For Few-Shot Learning [71.44422347502409]
We propose textbftextttMerging Pretrained Adapters (MerA) that efficiently incorporates pretrained adapters to a single model through model fusion.
Experiments on two PLMs demonstrate that MerA substantial improvements compared to both single adapters and AdapterFusion.
arXiv Detail & Related papers (2023-08-30T12:10:17Z) - AdaMix: Mixture-of-Adapter for Parameter-efficient Tuning of Large
Language Models [119.7093605087114]
Fine-tuning large-scale pre-trained language models to downstream tasks require updating hundreds of millions of parameters.
This not only increases the serving cost to store a large copy of the model weights for every task, but also exhibits instability during few-shot task adaptation.
We introduce a new mechanism to improve adapter capacity without increasing parameters or computational cost by two key techniques.
arXiv Detail & Related papers (2022-05-24T23:41:22Z) - AdapterBias: Parameter-efficient Token-dependent Representation Shift
for Adapters in NLP Tasks [55.705355299065474]
Transformer-based pre-trained models with millions of parameters require large storage.
Recent approaches tackle this shortcoming by training adapters, but these approaches still require a relatively large number of parameters.
In this study, AdapterBias, a surprisingly simple yet effective adapter architecture, is proposed.
arXiv Detail & Related papers (2022-04-30T16:49:41Z) - AdapterHub: A Framework for Adapting Transformers [148.6877231725939]
AdapterHub is a framework that allows dynamic "stitching-in" of pre-trained adapters for different tasks and languages.
Our framework enables scalable and easy access to sharing of task-specific models.
arXiv Detail & Related papers (2020-07-15T15:56:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.