On Surgical Fine-tuning for Language Encoders
- URL: http://arxiv.org/abs/2310.17041v1
- Date: Wed, 25 Oct 2023 22:42:30 GMT
- Title: On Surgical Fine-tuning for Language Encoders
- Authors: Abhilasha Lodha, Gayatri Belapurkar, Saloni Chalkapurkar, Yuanming
Tao, Reshmi Ghosh, Samyadeep Basu, Dmitrii Petrov, Soundararajan Srinivasan
- Abstract summary: We show that for different downstream language tasks, fine-tuning only a subset of layers is sufficient to obtain performance that is close to and often better than fine-tuning all the layers in the language encoder.
We propose an efficient metric based on the diagonal of the Fisher information matrix (FIM score) to select the candidate layers for selective fine-tuning.
- Score: 2.3796105472622813
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fine-tuning all the layers of a pre-trained neural language encoder (either
using all the parameters or using parameter-efficient methods) is often the
de-facto way of adapting it to a new task. We show evidence that for different
downstream language tasks, fine-tuning only a subset of layers is sufficient to
obtain performance that is close to and often better than fine-tuning all the
layers in the language encoder. We propose an efficient metric based on the
diagonal of the Fisher information matrix (FIM score), to select the candidate
layers for selective fine-tuning. We show, empirically on GLUE and SuperGLUE
tasks and across distinct language encoders, that this metric can effectively
select layers leading to a strong downstream performance. Our work highlights
that task-specific information corresponding to a given downstream task is
often localized within a few layers, and tuning only those is sufficient for
strong performance. Additionally, we demonstrate the robustness of the FIM
score to rank layers in a manner that remains constant during the optimization
process.
Related papers
- Parameter-Efficient Fine-Tuning of Large Language Models using Semantic Knowledge Tuning [0.08795040582681389]
Large Language Models (LLMs) are gaining significant popularity in recent years for specialized tasks using prompts.
We propose a novel method called Semantic Knowledge Tuning (SK-Tuning) for prompt and prefix tuning that employs meaningful words instead of random tokens.
Our experimental results show that SK-Tuning exhibits faster training times, fewer parameters, and superior performance on tasks such as text classification and understanding.
arXiv Detail & Related papers (2024-10-11T07:55:09Z) - SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts.
We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM.
We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z) - Entropy Guided Extrapolative Decoding to Improve Factuality in Large Language Models [55.45444773200529]
Large language models (LLMs) exhibit impressive natural language capabilities but suffer from hallucination.
Recent work has focused on decoding techniques to improve factuality during inference.
arXiv Detail & Related papers (2024-04-14T19:45:35Z) - Joint Prompt Optimization of Stacked LLMs using Variational Inference [66.04409787899583]
Large language models (LLMs) can be seen as atomic units of computation mapping sequences to a distribution over sequences.
By stacking two such layers and feeding the output of one layer to the next, we obtain a Deep Language Network (DLN)
We show that DLN-2 can reach higher performance than a single layer, showing promise that we might reach comparable performance to GPT-4.
arXiv Detail & Related papers (2023-06-21T18:45:56Z) - Towards Adaptive Prefix Tuning for Parameter-Efficient Language Model
Fine-tuning [32.84435258519842]
We propose Adaptive Prefix Tuning (APT) to adjust the prefix in terms of both fine-grained token level and coarse-grained layer level with a gate mechanism.
Experiments on the SuperGLUE and NER datasets show the effectiveness of APT.
arXiv Detail & Related papers (2023-05-24T14:51:01Z) - Hidden State Variability of Pretrained Language Models Can Guide
Computation Reduction for Transfer Learning [16.60284838029852]
We investigate whether one could make a task-specific selection on which subset of the layers to adapt.
We propose to select layers based on the variability of their hidden states given a task-specific corpus.
arXiv Detail & Related papers (2022-10-18T17:58:43Z) - Probing for Understanding of English Verb Classes and Alternations in
Large Pre-trained Language Models [4.243426191555036]
We investigate the extent to which verb alternation classes are encoded in the embeddings of Large Pre-trained Language Models.
We find that contextual embeddings from PLMs achieve astonishingly high accuracies on tasks across most classes.
arXiv Detail & Related papers (2022-09-11T08:04:40Z) - Improving Multilingual Translation by Representation and Gradient
Regularization [82.42760103045083]
We propose a joint approach to regularize NMT models at both representation-level and gradient-level.
Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
arXiv Detail & Related papers (2021-09-10T10:52:21Z) - Enhancing Transformers with Gradient Boosted Decision Trees for NLI
Fine-Tuning [7.906608953906889]
We introduce FreeGBDT, a method of fitting a GBDT head on the features computed during fine-tuning to increase performance without additional computation by the neural network.
We demonstrate the effectiveness of our method on several NLI datasets using a strong baseline model.
arXiv Detail & Related papers (2021-05-08T22:31:51Z) - CSS-LM: A Contrastive Framework for Semi-supervised Fine-tuning of
Pre-trained Language Models [59.49705076369856]
We introduce a novel framework to improve the fine-tuning phase of pre-trained language models (PLMs)
We retrieve positive and negative instances from large-scale unlabeled corpora according to their domain-level and class-level semantic relatedness to a task.
We then perform contrastive semi-supervised learning on both the retrieved unlabeled and original labeled instances to help PLMs capture crucial task-related semantic features.
arXiv Detail & Related papers (2021-02-07T09:27:26Z) - Rethinking and Improving Natural Language Generation with Layer-Wise
Multi-View Decoding [59.48857453699463]
In sequence-to-sequence learning, the decoder relies on the attention mechanism to efficiently extract information from the encoder.
Recent work has proposed to use representations from different encoder layers for diversified levels of information.
We propose layer-wise multi-view decoding, where for each decoder layer, together with the representations from the last encoder layer, which serve as a global view, those from other encoder layers are supplemented for a stereoscopic view of the source sequences.
arXiv Detail & Related papers (2020-05-16T20:00:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.