Related papers: Adaptable Adapters

Adaptable Adapters

URL: http://arxiv.org/abs/2205.01549v1
Date: Tue, 3 May 2022 14:59:27 GMT
Title: Adaptable Adapters
Authors: Nafise Sadat Moosavi, Quentin Delfosse, Kristian Kersting, Iryna Gurevych
Abstract summary: State-of-the-art pretrained NLP models contain a hundred million to trillion parameters. Adaptable adapters contain different activation functions for different layers and different input data. We show that adaptable adapters achieve on-par performances with the standard adapter architecture.
Score: 74.65986170056945
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: State-of-the-art pretrained NLP models contain a hundred million to trillion parameters. Adapters provide a parameter-efficient alternative for the full finetuning in which we can only finetune lightweight neural network layers on top of pretrained weights. Adapter layers are initialized randomly. However, existing work uses the same adapter architecture -- i.e., the same adapter layer on top of each layer of the pretrained model -- for every dataset, regardless of the properties of the dataset or the amount of available training data. In this work, we introduce adaptable adapters that contain (1) learning different activation functions for different layers and different input data, and (2) a learnable switch to select and only use the beneficial adapter layers. We show that adaptable adapters achieve on-par performances with the standard adapter architecture while using a considerably smaller number of adapter layers. In addition, we show that the selected adapter architecture by adaptable adapters transfers well across different data settings and similar tasks. We propose to use adaptable adapters for designing efficient and effective adapter architectures. The resulting adapters (a) contain about 50% of the learning parameters of the standard adapter and are therefore more efficient at training and inference, and require less storage space, and (b) achieve considerably higher performances in low-data settings.

Related papers

Adapters Strike Back [10.490880056507198]
We provide an in-depth study of adapters, their internal structure, as well as various implementation choices. We suggest a concrete, improved adapter architecture, called Adapter+, that not only outperforms previous adapter implementations but surpasses a number of other, more complex adaptation mechanisms in several challenging settings.
arXiv Detail & Related papers (2024-06-10T22:07:57Z)
Stylus: Automatic Adapter Selection for Diffusion Models [81.90482700433822]
We introduce Stylus, which efficiently selects and automatically composes task-specific adapters based on a prompt's keywords. Stylus outlines a three-stage approach that first summarizes adapters with improved descriptions and embeddings, retrieves relevant adapters, and then further assembles adapters based on prompts' keywords by checking how well they fit the prompt.
arXiv Detail & Related papers (2024-04-29T17:59:16Z)
Hierarchical Recurrent Adapters for Efficient Multi-Task Adaptation of Large Speech Models [12.230087530720652]
We introduce an adapter module that has a better efficiency in large scale multi-task adaptation scenario. The adapter consists of a single shared controller network and multiple task-level adapter heads.
arXiv Detail & Related papers (2024-03-25T17:21:56Z)
MerA: Merging Pretrained Adapters For Few-Shot Learning [71.44422347502409]
We propose textbftextttMerging Pretrained Adapters (MerA) that efficiently incorporates pretrained adapters to a single model through model fusion. Experiments on two PLMs demonstrate that MerA substantial improvements compared to both single adapters and AdapterFusion.
arXiv Detail & Related papers (2023-08-30T12:10:17Z)
Parameter-Efficient Sparse Retrievers and Rerankers using Adapters [4.9545244468634655]
We study adapters for SPLADE, a sparse retriever, for which adapters retain the efficiency and effectiveness otherwise achieved by finetuning. We also address domain adaptation of neural retrieval thanks to adapters on cross-domain BEIR datasets and TripClick.
arXiv Detail & Related papers (2023-03-23T12:34:30Z)
AdaMix: Mixture-of-Adapter for Parameter-efficient Tuning of Large Language Models [119.7093605087114]
Fine-tuning large-scale pre-trained language models to downstream tasks require updating hundreds of millions of parameters. This not only increases the serving cost to store a large copy of the model weights for every task, but also exhibits instability during few-shot task adaptation. We introduce a new mechanism to improve adapter capacity without increasing parameters or computational cost by two key techniques.
arXiv Detail & Related papers (2022-05-24T23:41:22Z)
AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks [55.705355299065474]
Transformer-based pre-trained models with millions of parameters require large storage. Recent approaches tackle this shortcoming by training adapters, but these approaches still require a relatively large number of parameters. In this study, AdapterBias, a surprisingly simple yet effective adapter architecture, is proposed.
arXiv Detail & Related papers (2022-04-30T16:49:41Z)
AdapterHub: A Framework for Adapting Transformers [148.6877231725939]
AdapterHub is a framework that allows dynamic "stitching-in" of pre-trained adapters for different tasks and languages. Our framework enables scalable and easy access to sharing of task-specific models.
arXiv Detail & Related papers (2020-07-15T15:56:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.