Efficiently Tuned Parameters are Task Embeddings
- URL: http://arxiv.org/abs/2210.11705v1
- Date: Fri, 21 Oct 2022 03:19:54 GMT
- Title: Efficiently Tuned Parameters are Task Embeddings
- Authors: Wangchunshu Zhou and Canwen Xu and Julian McAuley
- Abstract summary: Intermediate-task transfer can benefit a wide range of NLP tasks with properly selected source datasets.
It is computationally infeasible to experiment with all intermediate transfer combinations.
We propose to exploit these efficiently tuned parameters as off-the-shelf task embeddings.
- Score: 26.587153525003636
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Intermediate-task transfer can benefit a wide range of NLP tasks with
properly selected source datasets. However, it is computationally infeasible to
experiment with all intermediate transfer combinations, making choosing a
useful source task a challenging problem. In this paper, we anticipate that
task-specific parameters updated in parameter-efficient tuning methods are
likely to encode task-specific information. Therefore, such parameters can be
predictive for inter-task transferability. Thus, we propose to exploit these
efficiently tuned parameters as off-the-shelf task embeddings for the efficient
selection of source datasets for intermediate-task transfer. We experiment with
11 text classification tasks and 11 question answering tasks. Experimental
results show that our approach can consistently outperform existing inter-task
transferability prediction methods while being conceptually simple and
computationally efficient. Our analysis also reveals that the ability of
efficiently tuned parameters on transferability prediction is disentangled with
their in-task performance. This allows us to use parameters from early
checkpoints as task embeddings to further improve efficiency.
Related papers
- Exploring the Effectiveness and Consistency of Task Selection in Intermediate-Task Transfer Learning [21.652389166495407]
We show that the transfer performance exhibits severe variance across different source tasks and training seeds.
Compared to embedding-free methods and text embeddings, task embeddings constructed from fine-tuned weights can better estimate task transferability.
We introduce a novel method that measures pairwise token similarity using maximum inner product search, leading to the highest performance in task prediction.
arXiv Detail & Related papers (2024-07-23T07:31:43Z) - Jointly Reparametrized Multi-Layer Adaptation for Efficient and Private
Tuning [32.69028093984526]
We propose a novel language transformer finetuning strategy that introduces task-specific parameters in multiple transformer layers.
We achieve within 5% of full finetuning performance on GLUE tasks with as few as 4,100 parameters per task.
Our method achieves the best or comparable utility compared to several recent finetuning methods when training with the same privacy constraints.
arXiv Detail & Related papers (2023-05-30T17:55:06Z) - Revisit Parameter-Efficient Transfer Learning: A Two-Stage Paradigm [21.747744343882392]
PETL aims at efficiently adapting large models pre-trained on massive data to downstream tasks with limited task-specific data.
This paper proposes a novel two-stage paradigm, where the pre-trained model is first aligned to the target distribution.
The proposed paradigm achieves state-of-the-art performance on the average accuracy of 19 downstream tasks.
arXiv Detail & Related papers (2023-03-14T13:50:31Z) - Divergence-Based Domain Transferability for Zero-Shot Classification [78.55044112903148]
Transferring learned patterns from pretrained neural language models has been shown to significantly improve effectiveness across a variety of language-based tasks.
Further tuning on intermediate tasks has been demonstrated to provide additional performance benefits, provided the intermediate task is sufficiently related to the target task.
However, how to identify related tasks is an open problem, and brute-force searching effective task combinations is prohibitively expensive.
arXiv Detail & Related papers (2023-02-11T16:04:38Z) - Attentional Mixtures of Soft Prompt Tuning for Parameter-efficient
Multi-task Knowledge Sharing [53.399742232323895]
ATTEMPT is a new modular, multi-task, and parameter-efficient language model (LM) tuning approach.
It combines knowledge transferred across different tasks via a mixture of soft prompts while keeping original LM unchanged.
It is parameter-efficient (e.g., updates 1,600 times fewer parameters than fine-tuning) and enables multi-task learning and flexible extensions.
arXiv Detail & Related papers (2022-05-24T10:48:33Z) - Identifying Suitable Tasks for Inductive Transfer Through the Analysis
of Feature Attributions [78.55044112903148]
We use explainability techniques to predict whether task pairs will be complementary, through comparison of neural network activation between single-task models.
Our results show that, through this approach, it is possible to reduce training time by up to 83.5% at a cost of only 0.034 reduction in positive-class F1 on the TREC-IS 2020-A dataset.
arXiv Detail & Related papers (2022-02-02T15:51:07Z) - What to Pre-Train on? Efficient Intermediate Task Selection [46.15624815492324]
Intermediate task fine-tuning has been shown to culminate in large transfer gains across many NLP tasks.
In this work we first establish that similar sequential fine-tuning gains can be achieved in adapter settings.
We then consolidate previously proposed methods that efficiently identify beneficial tasks for intermediate transfer learning.
arXiv Detail & Related papers (2021-04-16T17:31:18Z) - Efficient Continual Adaptation for Generative Adversarial Networks [97.20244383723853]
We present a continual learning approach for generative adversarial networks (GANs)
Our approach is based on learning a set of global and task-specific parameters.
We show that the feature-map transformation based approach outperforms state-of-the-art continual GANs methods.
arXiv Detail & Related papers (2021-03-06T05:09:37Z) - Parameter-Efficient Transfer Learning with Diff Pruning [108.03864629388404]
diff pruning is a simple approach to enable parameter-efficient transfer learning within the pretrain-finetune framework.
We find that models finetuned with diff pruning can match the performance of fully finetuned baselines on the GLUE benchmark.
arXiv Detail & Related papers (2020-12-14T12:34:01Z) - Exploring and Predicting Transferability across NLP Tasks [115.6278033699853]
We study the transferability between 33 NLP tasks across three broad classes of problems.
Our results show that transfer learning is more beneficial than previously thought.
We also develop task embeddings that can be used to predict the most transferable source tasks for a given target task.
arXiv Detail & Related papers (2020-05-02T09:39:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.