Exploring Low-dimensional Intrinsic Task Subspace via Prompt Tuning
- URL: http://arxiv.org/abs/2110.07867v1
- Date: Fri, 15 Oct 2021 05:43:59 GMT
- Title: Exploring Low-dimensional Intrinsic Task Subspace via Prompt Tuning
- Authors: Yujia Qin, Xiaozhi Wang, Yusheng Su, Yankai Lin, Ning Ding, Zhiyuan
Liu, Juanzi Li, Lei Hou, Peng Li, Maosong Sun, Jie Zhou
- Abstract summary: In this work, we study how pre-trained language models (PLMs) learn universal representations and effectively adapt to broad NLP tasks differing a lot.
In experiments, we study diverse few-shot NLP tasks and surprisingly find that in a 5-dimensional subspace found with 100 random tasks, by only tuning 5 free parameters, we can recover 87% and 65% of the full prompt tuning performance.
- Score: 70.76016793057283
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: How can pre-trained language models (PLMs) learn universal representations
and effectively adapt to broad NLP tasks differing a lot superficially? In this
work, we empirically find evidences indicating that the adaptations of PLMs to
various tasks can be reparameterized as optimizing only a few free parameters
in a common low-dimensional intrinsic task subspace, which may help us
understand why PLMs could easily adapt to various NLP tasks with small-scale
data. Specifically, to find such a subspace and examine its universality, we
resort to the recent success of prompt tuning and decompose the soft prompts of
multiple NLP tasks into the same low-dimensional nonlinear subspace, then we
learn to adapt the PLM to unseen tasks or data by only tuning parameters in the
subspace. We dub this pipeline as intrinsic prompt tuning (IPT). In
experiments, we study diverse few-shot NLP tasks and surprisingly find that in
a 5-dimensional subspace found with 100 random tasks, by only tuning 5 free
parameters, we can recover 87% and 65% of the full prompt tuning performance
for 100 seen tasks (using different training data) and 20 unseen tasks,
respectively, showing great generalization ability of the found intrinsic task
subspace. Besides being an analysis tool, IPT could further bring practical
benefits, such as improving the prompt tuning stability.
Related papers
- Fine-tuning Happens in Tiny Subspaces: Exploring Intrinsic Task-specific
Subspaces of Pre-trained Language Models [16.28794184086409]
Pre-trained language models (PLMs) are known to be overly parameterized and have significant redundancy.
We study the problem of re- parameterizing and fine-tuning PLMs from a new perspective: Discovery of intrinsic task-specific subspace.
A key finding is that PLMs can be effectively fine-tuned in the subspace with a small number of free parameters.
arXiv Detail & Related papers (2023-05-27T11:16:26Z) - Dynamic Prompting: A Unified Framework for Prompt Tuning [33.175097465669374]
We present a unified dynamic prompt (DP) tuning strategy that dynamically determines different factors of prompts based on specific tasks and instances.
Experimental results underscore the significant performance improvement achieved by dynamic prompt tuning across a wide range of tasks.
We establish the universal applicability of our approach under full-data, few-shot, and multitask scenarios.
arXiv Detail & Related papers (2023-03-06T06:04:46Z) - SPT: Semi-Parametric Prompt Tuning for Multitask Prompted Learning [28.29889045842277]
Multitask prompted learning can help generalization through a diverse set of tasks at once.
We propose SPT, a semi-parametric prompt tuning method for multitask prompted learning.
arXiv Detail & Related papers (2022-12-21T11:18:09Z) - AdaTask: A Task-aware Adaptive Learning Rate Approach to Multi-task
Learning [19.201899503691266]
We measure the task dominance degree of a parameter by the total updates of each task on this parameter.
We propose a Task-wise Adaptive learning rate approach, AdaTask, to separate the emphaccumulative gradients and hence the learning rate of each task.
Experiments on computer vision and recommender system MTL datasets demonstrate that AdaTask significantly improves the performance of dominated tasks.
arXiv Detail & Related papers (2022-11-28T04:24:38Z) - Instance-wise Prompt Tuning for Pretrained Language Models [72.74916121511662]
Instance-wise Prompt Tuning (IPT) is the first prompt learning paradigm that injects knowledge from the input data instances to the prompts.
IPT significantly outperforms task-based prompt learning methods, and achieves comparable performance to conventional finetuning with only 0.5% - 1.5% of tuned parameters.
arXiv Detail & Related papers (2022-06-04T10:08:50Z) - Attentional Mixtures of Soft Prompt Tuning for Parameter-efficient
Multi-task Knowledge Sharing [53.399742232323895]
ATTEMPT is a new modular, multi-task, and parameter-efficient language model (LM) tuning approach.
It combines knowledge transferred across different tasks via a mixture of soft prompts while keeping original LM unchanged.
It is parameter-efficient (e.g., updates 1,600 times fewer parameters than fine-tuning) and enables multi-task learning and flexible extensions.
arXiv Detail & Related papers (2022-05-24T10:48:33Z) - Analysis and Prediction of NLP Models Via Task Embeddings [25.311690222754454]
We propose MetaEval, a collection of $101$ NLP tasks.
We fit a single transformer to all MetaEval tasks jointly while conditioning it on learned embeddings.
The resulting task embeddings enable a novel analysis of the space of tasks.
arXiv Detail & Related papers (2021-12-10T16:23:24Z) - Sample Efficient Linear Meta-Learning by Alternating Minimization [74.40553081646995]
We study a simple alternating minimization method (MLLAM) which alternately learns the low-dimensional subspace and the regressors.
We show that for a constant subspace dimension MLLAM obtains nearly-optimal estimation error, despite requiring only $Omega(log d)$ samples per task.
We propose a novel task subset selection scheme that ensures the same strong statistical guarantee as MLLAM.
arXiv Detail & Related papers (2021-05-18T06:46:48Z) - Parameter-Efficient Transfer Learning with Diff Pruning [108.03864629388404]
diff pruning is a simple approach to enable parameter-efficient transfer learning within the pretrain-finetune framework.
We find that models finetuned with diff pruning can match the performance of fully finetuned baselines on the GLUE benchmark.
arXiv Detail & Related papers (2020-12-14T12:34:01Z) - Meta Reinforcement Learning with Autonomous Inference of Subtask
Dependencies [57.27944046925876]
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph.
Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference.
Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter.
arXiv Detail & Related papers (2020-01-01T17:34:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.