Parameter-Efficient Tuning by Manipulating Hidden States of Pretrained
Language Models For Classification Tasks
- URL: http://arxiv.org/abs/2204.04596v2
- Date: Wed, 13 Apr 2022 10:28:13 GMT
- Title: Parameter-Efficient Tuning by Manipulating Hidden States of Pretrained
Language Models For Classification Tasks
- Authors: Haoran Yang, Piji Li, Wai Lam
- Abstract summary: We propose a simple tuning method which only introduces three trainable vectors.
We input the integrated hidden state(s) to a task-specific linear classifier to predict categories.
This scheme is similar to the way ELMo utilises hidden states except that they feed the hidden states to LSTM-based models.
- Score: 49.807185872741066
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Parameter-efficient tuning aims to distill knowledge for downstream tasks by
optimizing a few introduced parameters while freezing the pretrained language
models (PLMs). Continuous prompt tuning which prepends a few trainable vectors
to the embeddings of input is one of these methods and has drawn much attention
due to its effectiveness and efficiency. This family of methods can be
illustrated as exerting nonlinear transformations of hidden states inside PLMs.
However, a natural question is ignored: can the hidden states be directly used
for classification without changing them? In this paper, we aim to answer this
question by proposing a simple tuning method which only introduces three
trainable vectors. Firstly, we integrate all layers hidden states using the
introduced vectors. And then, we input the integrated hidden state(s) to a
task-specific linear classifier to predict categories. This scheme is similar
to the way ELMo utilises hidden states except that they feed the hidden states
to LSTM-based models. Although our proposed tuning scheme is simple, it
achieves comparable performance with prompt tuning methods like P-tuning and
P-tuning v2, verifying that original hidden states do contain useful
information for classification tasks. Moreover, our method has an advantage
over prompt tuning in terms of time and the number of parameters.
Related papers
- LoFiT: Localized Fine-tuning on LLM Representations [60.99814930367597]
We introduce a framework called Localized Fine-Tuning on LLM Representations (LoFiT)
LoFiT identifies a subset of attention heads that are most important for learning a specific task, then trains offset vectors to add to the model's hidden representations at those selected heads.
For truthfulness and reasoning tasks, we find that LoFiT's intervention vectors are more effective for LLM adaptation than vectors from representation intervention methods such as Inference-time Intervention.
arXiv Detail & Related papers (2024-06-03T17:45:41Z) - Manifold-based Verbalizer Space Re-embedding for Tuning-free
Prompt-based Classification [34.33544689818836]
We propose a tuning-free manifold-based space re-embedding method called Locally Linear Embedding with Intra-class Neighborhood Constraint.
Our approach further enhances prompt-based tuning by up to 3.2%.
arXiv Detail & Related papers (2023-09-08T07:42:29Z) - On the Effectiveness of LayerNorm Tuning for Continual Learning in
Vision Transformers [47.77328392236625]
State-of-the-art rehearsal-free continual learning methods exploit the peculiarities of Vision Transformers to learn task-specific prompts.
We introduce a two-stage training procedure, where we first optimize the task-specific parameters and then train the classifier with the same selection procedure of the inference time.
Our method achieves results that are either superior or on par with the state of the art while being computationally cheaper.
arXiv Detail & Related papers (2023-08-18T15:11:16Z) - Regularized Mask Tuning: Uncovering Hidden Knowledge in Pre-trained
Vision-Language Models [89.07925369856139]
We design a new type of tuning method, termed as regularized mask tuning, which masks the network parameters through a learnable selection.
Inspired by neural pathways, we argue that the knowledge required by a downstream task already exists in the pre-trained weights but just gets concealed in the upstream pre-training stage.
It is noteworthy that we manage to deliver 18.73% performance improvement compared to the zero-shot CLIP via masking an average of only 2.56% parameters.
arXiv Detail & Related papers (2023-07-27T17:56:05Z) - Approximated Prompt Tuning for Vision-Language Pre-trained Models [54.326232586461614]
In vision-language pre-trained models, prompt tuning often requires a large number of learnable tokens to bridge the gap between the pre-training and downstream tasks.
We propose a novel Approximated Prompt Tuning (APT) approach towards efficient VL transfer learning.
arXiv Detail & Related papers (2023-06-27T05:43:47Z) - Hidden State Variability of Pretrained Language Models Can Guide
Computation Reduction for Transfer Learning [16.60284838029852]
We investigate whether one could make a task-specific selection on which subset of the layers to adapt.
We propose to select layers based on the variability of their hidden states given a task-specific corpus.
arXiv Detail & Related papers (2022-10-18T17:58:43Z) - Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than
In-Context Learning [81.3514358542452]
Few-shot in-context learning (ICL) incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made.
parameter-efficient fine-tuning offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task.
In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs.
arXiv Detail & Related papers (2022-05-11T17:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.