AdapterBias: Parameter-efficient Token-dependent Representation Shift
for Adapters in NLP Tasks
- URL: http://arxiv.org/abs/2205.00305v1
- Date: Sat, 30 Apr 2022 16:49:41 GMT
- Title: AdapterBias: Parameter-efficient Token-dependent Representation Shift
for Adapters in NLP Tasks
- Authors: Chin-Lun Fu, Zih-Ching Chen, Yun-Ru Lee, Hung-yi Lee
- Abstract summary: Transformer-based pre-trained models with millions of parameters require large storage.
Recent approaches tackle this shortcoming by training adapters, but these approaches still require a relatively large number of parameters.
In this study, AdapterBias, a surprisingly simple yet effective adapter architecture, is proposed.
- Score: 55.705355299065474
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformer-based pre-trained models with millions of parameters require
large storage. Recent approaches tackle this shortcoming by training adapters,
but these approaches still require a relatively large number of parameters. In
this study, AdapterBias, a surprisingly simple yet effective adapter
architecture, is proposed. AdapterBias adds a token-dependent shift to the
hidden output of transformer layers to adapt to downstream tasks with only a
vector and a linear layer. Extensive experiments are conducted to demonstrate
the effectiveness of AdapterBias. The experiments show that our proposed method
can dramatically reduce the trainable parameters compared to the previous works
with a minimal decrease in task performances compared with fine-tuned
pre-trained models. We further find that AdapterBias automatically learns to
assign more significant representation shifts to the tokens related to the task
in consideration.
Related papers
- Towards Optimal Adapter Placement for Efficient Transfer Learning [73.1149084352343]
PETL aims to adapt pre-trained models to new downstream tasks while minimizing the number of fine-tuned parameters.
adapters, a popular approach in PETL, inject additional capacity into existing networks by incorporating low-rank projections.
This paper investigates the relationship between the placement of an adapter and its performance.
arXiv Detail & Related papers (2024-10-21T10:37:17Z) - Hierarchical Recurrent Adapters for Efficient Multi-Task Adaptation of Large Speech Models [12.230087530720652]
We introduce an adapter module that has a better efficiency in large scale multi-task adaptation scenario.
The adapter consists of a single shared controller network and multiple task-level adapter heads.
arXiv Detail & Related papers (2024-03-25T17:21:56Z) - Efficient Adapter Tuning of Pre-trained Speech Models for Automatic
Speaker Verification [38.20393847192532]
Self-supervised speech models have shown impressive performance on various downstream speech tasks.
fine-tuning becomes practically unfeasible due to heavy computation and storage overhead.
We propose an effective adapter framework designed for adapting self-supervised speech models to the speaker verification task.
arXiv Detail & Related papers (2024-03-01T05:32:14Z) - Efficient Adaptation of Large Vision Transformer via Adapter
Re-Composing [8.88477151877883]
High-capacity pre-trained models have revolutionized problem-solving in computer vision.
We propose a novel Adapter Re-Composing (ARC) strategy that addresses efficient pre-trained model adaptation.
Our approach considers the reusability of adaptation parameters and introduces a parameter-sharing scheme.
arXiv Detail & Related papers (2023-10-10T01:04:15Z) - MerA: Merging Pretrained Adapters For Few-Shot Learning [71.44422347502409]
We propose textbftextttMerging Pretrained Adapters (MerA) that efficiently incorporates pretrained adapters to a single model through model fusion.
Experiments on two PLMs demonstrate that MerA substantial improvements compared to both single adapters and AdapterFusion.
arXiv Detail & Related papers (2023-08-30T12:10:17Z) - Revisiting the Parameter Efficiency of Adapters from the Perspective of
Precision Redundancy [17.203320079872952]
Current state-of-the-art results in computer vision depend in part on fine-tuning large pre-trained vision models.
With the exponential growth of model sizes, the conventional full fine-tuning leads to increasingly huge storage and transmission overhead.
In this paper, we investigate how to make adapters even more efficient, reaching a new minimum size required to store a task-specific fine-tuned network.
arXiv Detail & Related papers (2023-07-31T17:22:17Z) - AdaMix: Mixture-of-Adapter for Parameter-efficient Tuning of Large
Language Models [119.7093605087114]
Fine-tuning large-scale pre-trained language models to downstream tasks require updating hundreds of millions of parameters.
This not only increases the serving cost to store a large copy of the model weights for every task, but also exhibits instability during few-shot task adaptation.
We introduce a new mechanism to improve adapter capacity without increasing parameters or computational cost by two key techniques.
arXiv Detail & Related papers (2022-05-24T23:41:22Z) - Adaptable Adapters [74.65986170056945]
State-of-the-art pretrained NLP models contain a hundred million to trillion parameters.
Adaptable adapters contain different activation functions for different layers and different input data.
We show that adaptable adapters achieve on-par performances with the standard adapter architecture.
arXiv Detail & Related papers (2022-05-03T14:59:27Z) - AdapterDrop: On the Efficiency of Adapters in Transformers [53.845909603631945]
Massively pre-trained transformer models are computationally expensive to fine-tune, slow for inference, and have large storage requirements.
Recent approaches tackle these shortcomings by training smaller models, dynamically reducing the model size, and by training light-weight adapters.
arXiv Detail & Related papers (2020-10-22T17:49:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.