ScaLearn: Simple and Highly Parameter-Efficient Task Transfer by Learning to Scale
- URL: http://arxiv.org/abs/2310.01217v3
- Date: Fri, 17 May 2024 14:23:08 GMT
- Title: ScaLearn: Simple and Highly Parameter-Efficient Task Transfer by Learning to Scale
- Authors: Markus Frohmann, Carolin Holtermann, Shahed Masoudian, Anne Lauscher, Navid Rekabsaz,
- Abstract summary: ScaLearn is a simple and highly parameter-efficient two-stage MTL method.
We show that ScaLearn consistently outperforms strong baselines with a small number of transfer parameters.
- Score: 18.396897413970965
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-task learning (MTL) has shown considerable practical benefits, particularly when using language models (LMs). While this is commonly achieved by learning $n$ tasks under a joint optimization procedure, some methods, such as AdapterFusion, divide the problem into two stages: (i) task learning, where knowledge specific to a task is encapsulated within sets of parameters (e.g., adapters), and (ii) transfer, where this already learned knowledge is leveraged for a target task. This separation of concerns provides numerous benefits (e.g., promoting reusability). However, current two-stage MTL introduces a substantial number of additional parameters. We address this issue by leveraging the usefulness of linearly scaling the output representations of source adapters for transfer learning. We introduce ScaLearn, a simple and highly parameter-efficient two-stage MTL method that capitalizes on the knowledge of the source tasks by learning a minimal set of scaling parameters that enable effective transfer to a target task. Our experiments on three benchmarks (GLUE, SuperGLUE, and HumSet) and two encoder LMs show that ScaLearn consistently outperforms strong baselines with a small number of transfer parameters (~ $0.35$% of those of AdapterFusion). Remarkably, we observe that ScaLearn maintains its strong abilities even when further reducing parameters, achieving competitive results with only $8$ transfer parameters per target task. Our proposed approach thus demonstrates the power of simple scaling as a promise for more efficient task transfer.
Related papers
- 3-in-1: 2D Rotary Adaptation for Efficient Finetuning, Efficient Batching and Composability [6.451743797015637]
We introduce a novel method, RoAd, which employs a straightforward 2D rotation to adapt large language models (LLMs)
RoAd is remarkably parameter-efficient, delivering optimal performance on GLUE, eight commonsense reasoning tasks and four arithmetic reasoning tasks with $0.1%$ trainable parameters.
arXiv Detail & Related papers (2024-08-28T08:45:29Z) - VMT-Adapter: Parameter-Efficient Transfer Learning for Multi-Task Dense
Scene Understanding [6.816428690763012]
A standard approach to leverage large-scale pre-trained models is to fine-tune all model parameters for downstream tasks.
We propose VMT-Adapter, which shares knowledge from multiple tasks to enhance cross-task interaction.
We also propose VMT-Adapter-Lite, which further reduces the trainable parameters by learning shared parameters between down- and up-projections.
arXiv Detail & Related papers (2023-12-14T08:25:04Z) - $\pi$-Tuning: Transferring Multimodal Foundation Models with Optimal
Multi-task Interpolation [30.551283402200657]
$pi$-Tuning is a universal parameter-efficient transfer learning method for vision, language, and vision-language tasks.
It aggregates the parameters of lightweight task-specific experts learned from similar tasks to aid the target downstream task.
arXiv Detail & Related papers (2023-04-27T17:49:54Z) - When to Use Multi-Task Learning vs Intermediate Fine-Tuning for
Pre-Trained Encoder Transfer Learning [15.39115079099451]
Transfer learning (TL) in natural language processing has seen a surge of interest in recent years.
Three main strategies have emerged for making use of multiple supervised datasets during fine-tuning.
We compare all three TL methods in a comprehensive analysis on the GLUE dataset suite.
arXiv Detail & Related papers (2022-05-17T06:48:45Z) - Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than
In-Context Learning [81.3514358542452]
Few-shot in-context learning (ICL) incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made.
parameter-efficient fine-tuning offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task.
In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs.
arXiv Detail & Related papers (2022-05-11T17:10:41Z) - Task Adaptive Parameter Sharing for Multi-Task Learning [114.80350786535952]
Adaptive Task Adapting Sharing (TAPS) is a method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers.
Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters.
We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
arXiv Detail & Related papers (2022-03-30T23:16:07Z) - Efficient Feature Transformations for Discriminative and Generative
Continual Learning [98.10425163678082]
We propose a simple task-specific feature map transformation strategy for continual learning.
Theses provide powerful flexibility for learning new tasks, achieved with minimal parameters added to the base architecture.
We demonstrate the efficacy and efficiency of our method with an extensive set of experiments in discriminative (CIFAR-100 and ImageNet-1K) and generative sequences of tasks.
arXiv Detail & Related papers (2021-03-25T01:48:14Z) - Parameter-Efficient Transfer Learning with Diff Pruning [108.03864629388404]
diff pruning is a simple approach to enable parameter-efficient transfer learning within the pretrain-finetune framework.
We find that models finetuned with diff pruning can match the performance of fully finetuned baselines on the GLUE benchmark.
arXiv Detail & Related papers (2020-12-14T12:34:01Z) - Towards Accurate Knowledge Transfer via Target-awareness Representation
Disentanglement [56.40587594647692]
We propose a novel transfer learning algorithm, introducing the idea of Target-awareness REpresentation Disentanglement (TRED)
TRED disentangles the relevant knowledge with respect to the target task from the original source model and used as a regularizer during fine-tuning the target model.
Experiments on various real world datasets show that our method stably improves the standard fine-tuning by more than 2% in average.
arXiv Detail & Related papers (2020-10-16T17:45:08Z) - Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning
in NLP Using Fewer Parameters & Less Data [5.689320790746046]
Multi-Task Learning (MTL) networks have emerged as a promising method for transferring learned knowledge across different tasks.
However, MTL must deal with challenges such as: overfitting to low resource tasks, catastrophic forgetting, and negative task transfer.
We propose a novel Transformer architecture consisting of a new conditional attention mechanism and a set of task-conditioned modules.
arXiv Detail & Related papers (2020-09-19T02:04:34Z) - Exploring and Predicting Transferability across NLP Tasks [115.6278033699853]
We study the transferability between 33 NLP tasks across three broad classes of problems.
Our results show that transfer learning is more beneficial than previously thought.
We also develop task embeddings that can be used to predict the most transferable source tasks for a given target task.
arXiv Detail & Related papers (2020-05-02T09:39:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.