AdapterHub: A Framework for Adapting Transformers
- URL: http://arxiv.org/abs/2007.07779v3
- Date: Tue, 6 Oct 2020 10:16:39 GMT
- Title: AdapterHub: A Framework for Adapting Transformers
- Authors: Jonas Pfeiffer, Andreas R\"uckl\'e, Clifton Poth, Aishwarya Kamath,
Ivan Vuli\'c, Sebastian Ruder, Kyunghyun Cho, Iryna Gurevych
- Abstract summary: AdapterHub is a framework that allows dynamic "stitching-in" of pre-trained adapters for different tasks and languages.
Our framework enables scalable and easy access to sharing of task-specific models.
- Score: 148.6877231725939
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The current modus operandi in NLP involves downloading and fine-tuning
pre-trained models consisting of millions or billions of parameters. Storing
and sharing such large trained models is expensive, slow, and time-consuming,
which impedes progress towards more general and versatile NLP methods that
learn from and for many tasks. Adapters -- small learnt bottleneck layers
inserted within each layer of a pre-trained model -- ameliorate this issue by
avoiding full fine-tuning of the entire model. However, sharing and integrating
adapter layers is not straightforward. We propose AdapterHub, a framework that
allows dynamic "stitching-in" of pre-trained adapters for different tasks and
languages. The framework, built on top of the popular HuggingFace Transformers
library, enables extremely easy and quick adaptations of state-of-the-art
pre-trained models (e.g., BERT, RoBERTa, XLM-R) across tasks and languages.
Downloading, sharing, and training adapters is as seamless as possible using
minimal changes to the training scripts and a specialized infrastructure. Our
framework enables scalable and easy access to sharing of task-specific models,
particularly in low-resource scenarios. AdapterHub includes all recent adapter
architectures and can be found at https://AdapterHub.ml.
Related papers
- MerA: Merging Pretrained Adapters For Few-Shot Learning [71.44422347502409]
We propose textbftextttMerging Pretrained Adapters (MerA) that efficiently incorporates pretrained adapters to a single model through model fusion.
Experiments on two PLMs demonstrate that MerA substantial improvements compared to both single adapters and AdapterFusion.
arXiv Detail & Related papers (2023-08-30T12:10:17Z) - AdaMix: Mixture-of-Adapter for Parameter-efficient Tuning of Large
Language Models [119.7093605087114]
Fine-tuning large-scale pre-trained language models to downstream tasks require updating hundreds of millions of parameters.
This not only increases the serving cost to store a large copy of the model weights for every task, but also exhibits instability during few-shot task adaptation.
We introduce a new mechanism to improve adapter capacity without increasing parameters or computational cost by two key techniques.
arXiv Detail & Related papers (2022-05-24T23:41:22Z) - Adaptable Adapters [74.65986170056945]
State-of-the-art pretrained NLP models contain a hundred million to trillion parameters.
Adaptable adapters contain different activation functions for different layers and different input data.
We show that adaptable adapters achieve on-par performances with the standard adapter architecture.
arXiv Detail & Related papers (2022-05-03T14:59:27Z) - AdapterBias: Parameter-efficient Token-dependent Representation Shift
for Adapters in NLP Tasks [55.705355299065474]
Transformer-based pre-trained models with millions of parameters require large storage.
Recent approaches tackle this shortcoming by training adapters, but these approaches still require a relatively large number of parameters.
In this study, AdapterBias, a surprisingly simple yet effective adapter architecture, is proposed.
arXiv Detail & Related papers (2022-04-30T16:49:41Z) - AdapterHub Playground: Simple and Flexible Few-Shot Learning with
Adapters [34.86139827292556]
Open-access dissemination of pretrained language models has led to a democratization of state-of-the-art natural language processing (NLP) research.
This also allows people outside of NLP to use such models and adapt them to specific use-cases.
In this work, we aim to overcome this gap by providing a tool which allows researchers to leverage pretrained models without writing a single line of code.
arXiv Detail & Related papers (2021-08-18T11:56:01Z) - Parameter-efficient Multi-task Fine-tuning for Transformers via Shared
Hypernetworks [37.2958914602899]
We show that we can learn adapter parameters for all layers and tasks by generating them using shared hypernetworks.
Experiments on the well-known GLUE benchmark show improved performance in multi-task learning while adding only 0.29% parameters per task.
arXiv Detail & Related papers (2021-06-08T16:16:40Z) - AdapterDrop: On the Efficiency of Adapters in Transformers [53.845909603631945]
Massively pre-trained transformer models are computationally expensive to fine-tune, slow for inference, and have large storage requirements.
Recent approaches tackle these shortcomings by training smaller models, dynamically reducing the model size, and by training light-weight adapters.
arXiv Detail & Related papers (2020-10-22T17:49:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.