Parameter-efficient Multi-task Fine-tuning for Transformers via Shared
Hypernetworks
- URL: http://arxiv.org/abs/2106.04489v1
- Date: Tue, 8 Jun 2021 16:16:40 GMT
- Title: Parameter-efficient Multi-task Fine-tuning for Transformers via Shared
Hypernetworks
- Authors: Rabeeh Karimi Mahabadi, Sebastian Ruder, Mostafa Dehghani, James
Henderson
- Abstract summary: We show that we can learn adapter parameters for all layers and tasks by generating them using shared hypernetworks.
Experiments on the well-known GLUE benchmark show improved performance in multi-task learning while adding only 0.29% parameters per task.
- Score: 37.2958914602899
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: State-of-the-art parameter-efficient fine-tuning methods rely on introducing
adapter modules between the layers of a pretrained language model. However,
such modules are trained separately for each task and thus do not enable
sharing information across tasks. In this paper, we show that we can learn
adapter parameters for all layers and tasks by generating them using shared
hypernetworks, which condition on task, adapter position, and layer id in a
transformer model. This parameter-efficient multi-task learning framework
allows us to achieve the best of both worlds by sharing knowledge across tasks
via hypernetworks while enabling the model to adapt to each individual task
through task-specific adapters. Experiments on the well-known GLUE benchmark
show improved performance in multi-task learning while adding only 0.29%
parameters per task. We additionally demonstrate substantial performance
improvements in few-shot domain generalization across a variety of tasks. Our
code is publicly available in https://github.com/rabeehk/hyperformer.
Related papers
- HyperLoader: Integrating Hypernetwork-Based LoRA and Adapter Layers into Multi-Task Transformers for Sequence Labelling [5.955463697605461]
We present HyperLoader, a simple approach that combines different parameter-efficient fine-tuning methods in a multi-task setting.
Our method combines the benefits of multi-task learning by capturing the structure of all tasks.
We provide empirical evidence that HyperLoader outperforms previous approaches in most datasets.
arXiv Detail & Related papers (2024-07-01T16:00:53Z) - Vision Transformer Adapters for Generalizable Multitask Learning [61.79647180647685]
We introduce the first multitasking vision transformer adapters that learn generalizable task affinities.
Our adapters can simultaneously solve multiple dense vision tasks in a parameter-efficient manner.
In contrast to concurrent methods, we do not require retraining or fine-tuning whenever a new task or domain is added.
arXiv Detail & Related papers (2023-08-23T18:40:48Z) - Multi-Head Adapter Routing for Cross-Task Generalization [56.75667096355806]
Polytropon learns an inventory of adapters and a routing function that selects a subset of adapters for each task during both pre-training and few-shot adaptation.
We find that routing is most beneficial during multi-task pre-training rather than during few-shot adaptation.
arXiv Detail & Related papers (2022-11-07T19:35:55Z) - Polyhistor: Parameter-Efficient Multi-Task Adaptation for Dense Vision
Tasks [36.34331439747556]
We propose Polyhistor and Polyhistor-Lite to share information across different tasks with a few trainable parameters.
Specifically, Polyhistor achieves competitive accuracy compared to the state-of-the-art while only using 10% of their trainable parameters.
arXiv Detail & Related papers (2022-10-07T00:25:02Z) - AdaMix: Mixture-of-Adapter for Parameter-efficient Tuning of Large
Language Models [119.7093605087114]
Fine-tuning large-scale pre-trained language models to downstream tasks require updating hundreds of millions of parameters.
This not only increases the serving cost to store a large copy of the model weights for every task, but also exhibits instability during few-shot task adaptation.
We introduce a new mechanism to improve adapter capacity without increasing parameters or computational cost by two key techniques.
arXiv Detail & Related papers (2022-05-24T23:41:22Z) - Task Adaptive Parameter Sharing for Multi-Task Learning [114.80350786535952]
Adaptive Task Adapting Sharing (TAPS) is a method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers.
Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters.
We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
arXiv Detail & Related papers (2022-03-30T23:16:07Z) - HyperGrid: Efficient Multi-Task Transformers with Grid-wise Decomposable
Hyper Projections [96.64246471034195]
We propose textscHyperGrid, a new approach for highly effective multi-task learning.
Our method helps bridge the gap between fine-tuning and multi-task learning approaches.
arXiv Detail & Related papers (2020-07-12T02:49:16Z) - AdapterFusion: Non-Destructive Task Composition for Transfer Learning [104.9639614787314]
Sequential fine-tuning and multi-task learning are methods aiming to incorporate knowledge from multiple tasks.
We propose AdapterFusion, a new two stage learning algorithm that leverages knowledge from multiple tasks.
We show that our approach outperforms traditional strategies such as full fine-tuning as well as multi-task learning.
arXiv Detail & Related papers (2020-05-01T07:03:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.