Related papers: Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks

Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks

URL: http://arxiv.org/abs/2106.04489v1
Date: Tue, 8 Jun 2021 16:16:40 GMT
Title: Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks
Authors: Rabeeh Karimi Mahabadi, Sebastian Ruder, Mostafa Dehghani, James Henderson
Abstract summary: We show that we can learn adapter parameters for all layers and tasks by generating them using shared hypernetworks. Experiments on the well-known GLUE benchmark show improved performance in multi-task learning while adding only 0.29% parameters per task.
Score: 37.2958914602899
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: State-of-the-art parameter-efficient fine-tuning methods rely on introducing adapter modules between the layers of a pretrained language model. However, such modules are trained separately for each task and thus do not enable sharing information across tasks. In this paper, we show that we can learn adapter parameters for all layers and tasks by generating them using shared hypernetworks, which condition on task, adapter position, and layer id in a transformer model. This parameter-efficient multi-task learning framework allows us to achieve the best of both worlds by sharing knowledge across tasks via hypernetworks while enabling the model to adapt to each individual task through task-specific adapters. Experiments on the well-known GLUE benchmark show improved performance in multi-task learning while adding only 0.29% parameters per task. We additionally demonstrate substantial performance improvements in few-shot domain generalization across a variety of tasks. Our code is publicly available in https://github.com/rabeehk/hyperformer.

Related papers

Pilot: Building the Federated Multimodal Instruction Tuning Framework [79.56362403673354]
Our framework integrates two stages of "adapter on adapter" into the connector of the vision encoder and the LLM. In stage 1, we extract task-specific features and client-specific features from visual information. In stage 2, we build the cross-task Mixture-of-Adapters(CT-MoA) module to perform cross-task interaction.
arXiv Detail & Related papers (2025-01-23T07:49:24Z)
HyperLoader: Integrating Hypernetwork-Based LoRA and Adapter Layers into Multi-Task Transformers for Sequence Labelling [5.955463697605461]
We present HyperLoader, a simple approach that combines different parameter-efficient fine-tuning methods in a multi-task setting. Our method combines the benefits of multi-task learning by capturing the structure of all tasks. We provide empirical evidence that HyperLoader outperforms previous approaches in most datasets.
arXiv Detail & Related papers (2024-07-01T16:00:53Z)
Vision Transformer Adapters for Generalizable Multitask Learning [61.79647180647685]
We introduce the first multitasking vision transformer adapters that learn generalizable task affinities. Our adapters can simultaneously solve multiple dense vision tasks in a parameter-efficient manner. In contrast to concurrent methods, we do not require retraining or fine-tuning whenever a new task or domain is added.
arXiv Detail & Related papers (2023-08-23T18:40:48Z)
Multi-Head Adapter Routing for Cross-Task Generalization [56.75667096355806]
Polytropon learns an inventory of adapters and a routing function that selects a subset of adapters for each task during both pre-training and few-shot adaptation. We find that routing is most beneficial during multi-task pre-training rather than during few-shot adaptation.
arXiv Detail & Related papers (2022-11-07T19:35:55Z)
Polyhistor: Parameter-Efficient Multi-Task Adaptation for Dense Vision Tasks [36.34331439747556]
We propose Polyhistor and Polyhistor-Lite to share information across different tasks with a few trainable parameters. Specifically, Polyhistor achieves competitive accuracy compared to the state-of-the-art while only using 10% of their trainable parameters.
arXiv Detail & Related papers (2022-10-07T00:25:02Z)
AdaMix: Mixture-of-Adapter for Parameter-efficient Tuning of Large Language Models [119.7093605087114]
Fine-tuning large-scale pre-trained language models to downstream tasks require updating hundreds of millions of parameters. This not only increases the serving cost to store a large copy of the model weights for every task, but also exhibits instability during few-shot task adaptation. We introduce a new mechanism to improve adapter capacity without increasing parameters or computational cost by two key techniques.
arXiv Detail & Related papers (2022-05-24T23:41:22Z)
Task Adaptive Parameter Sharing for Multi-Task Learning [114.80350786535952]
Adaptive Task Adapting Sharing (TAPS) is a method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers. Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters. We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
arXiv Detail & Related papers (2022-03-30T23:16:07Z)
HyperGrid: Efficient Multi-Task Transformers with Grid-wise Decomposable Hyper Projections [96.64246471034195]
We propose textscHyperGrid, a new approach for highly effective multi-task learning. Our method helps bridge the gap between fine-tuning and multi-task learning approaches.
arXiv Detail & Related papers (2020-07-12T02:49:16Z)
AdapterFusion: Non-Destructive Task Composition for Transfer Learning [104.9639614787314]
Sequential fine-tuning and multi-task learning are methods aiming to incorporate knowledge from multiple tasks. We propose AdapterFusion, a new two stage learning algorithm that leverages knowledge from multiple tasks. We show that our approach outperforms traditional strategies such as full fine-tuning as well as multi-task learning.
arXiv Detail & Related papers (2020-05-01T07:03:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.