Towards Robust Low-Resource Fine-Tuning with Multi-View Compressed
Representations
- URL: http://arxiv.org/abs/2211.08794v4
- Date: Fri, 26 May 2023 16:47:18 GMT
- Title: Towards Robust Low-Resource Fine-Tuning with Multi-View Compressed
Representations
- Authors: Linlin Liu, Xingxuan Li, Megh Thakkar, Xin Li, Shafiq Joty, Luo Si,
Lidong Bing
- Abstract summary: Fine-tuning of pretrained language models (PLMs) is prone to overfitting in the low resource scenarios.
We present a novel method that operates on the hidden representations of a PLM to reduce overfitting.
- Score: 51.75960511842552
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Due to the huge amount of parameters, fine-tuning of pretrained language
models (PLMs) is prone to overfitting in the low resource scenarios. In this
work, we present a novel method that operates on the hidden representations of
a PLM to reduce overfitting. During fine-tuning, our method inserts random
autoencoders between the hidden layers of a PLM, which transform activations
from the previous layers into multi-view compressed representations before
feeding them into the upper layers. The autoencoders are plugged out after
fine-tuning, so our method does not add extra parameters or increase
computation cost during inference. Our method demonstrates promising
performance improvement across a wide range of sequence- and token-level
low-resource NLP tasks.
Related papers
- SparseGrad: A Selective Method for Efficient Fine-tuning of MLP Layers [88.68985153780514]
We propose a new selective PEFT method, namely SparseGrad, that performs well on parameter blocks.
We apply SparseGrad to fine-tune BERT and RoBERTa for the NLU task and LLaMa-2 for the Question-Answering task.
arXiv Detail & Related papers (2024-10-09T19:03:52Z) - LoRC: Low-Rank Compression for LLMs KV Cache with a Progressive Compression Strategy [59.1298692559785]
Key-Value ( KV) cache is crucial component in serving transformer-based autoregressive large language models (LLMs)
Existing approaches to mitigate this issue include: (1) efficient attention variants integrated in upcycling stages; (2) KV cache compression at test time; and (3) KV cache compression at test time.
We propose a low-rank approximation of KV weight matrices, allowing plug-in integration with existing transformer-based LLMs without model retraining.
Our method is designed to function without model tuning in upcycling stages or task-specific profiling in test stages.
arXiv Detail & Related papers (2024-10-04T03:10:53Z) - Multi-Prompting Decoder Helps Better Language Understanding [23.084538462710125]
We propose a simple yet effective Multi-Prompting Decoder (MPD) framework for MaaS adaptation.
Our method achieves new state-of-the-art results on multiple natural language understanding datasets under the few-shot setting.
arXiv Detail & Related papers (2024-06-10T13:58:46Z) - Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration [54.897493351694195]
We propose a novel parallel decoding approach, namely textithidden transfer, which decodes multiple successive tokens simultaneously in a single forward pass.
In terms of acceleration metrics, we outperform all the single-model acceleration techniques, including Medusa and Self-Speculative decoding.
arXiv Detail & Related papers (2024-04-18T09:17:06Z) - LoRAP: Transformer Sub-Layers Deserve Differentiated Structured Compression for Large Language Models [9.244526043014098]
Large language models (LLMs) show excellent performance in difficult tasks, but they often require massive memories and computational resources.
In this study, we make an important observation that the multi-head self-attention (MHA) sub-layer of Transformer exhibits noticeable low-rank structure.
We propose a mixed compression model, which organically combines Low-Rank matrix And structured Pruning (LoRAP)
arXiv Detail & Related papers (2024-04-15T11:53:22Z) - Frustratingly Simple Memory Efficiency for Pre-trained Language Models
via Dynamic Embedding Pruning [42.652021176354644]
The memory footprint of pre-trained language models (PLMs) can hinder deployment in memory-constrained settings.
We propose a simple yet effective approach that leverages this finding to minimize the memory footprint of the embedding matrix.
We show that this approach provides substantial reductions in memory usage across a wide range of models and tasks.
arXiv Detail & Related papers (2023-09-15T19:00:00Z) - Regularized Mask Tuning: Uncovering Hidden Knowledge in Pre-trained
Vision-Language Models [89.07925369856139]
We design a new type of tuning method, termed as regularized mask tuning, which masks the network parameters through a learnable selection.
Inspired by neural pathways, we argue that the knowledge required by a downstream task already exists in the pre-trained weights but just gets concealed in the upstream pre-training stage.
It is noteworthy that we manage to deliver 18.73% performance improvement compared to the zero-shot CLIP via masking an average of only 2.56% parameters.
arXiv Detail & Related papers (2023-07-27T17:56:05Z) - Just CHOP: Embarrassingly Simple LLM Compression [27.64461490974072]
Large language models (LLMs) enable unparalleled few- and zero-shot reasoning capabilities but at a high computational footprint.
We show that simple layer pruning coupled with an extended language model pretraining produces state-of-the-art results against structured and even semi-structured compression of models at a 7B scale.
We also show how distillation, which has been super effective in task-agnostic compression of smaller BERT-style models, becomes inefficient against our simple pruning technique.
arXiv Detail & Related papers (2023-05-24T08:18:35Z) - NoisyTune: A Little Noise Can Help You Finetune Pretrained Language
Models Better [98.5705258907774]
Finetuning pretrained language models (PLMs) is critical for their success in downstream tasks.
PLMs may have risks in overfitting pretraining signals, and there are gaps between downstream tasks and the pretraining tasks.
NoisyTune can help better finetune PLMs in downstream tasks by adding some noise to the parameters of PLMs before finetuning.
arXiv Detail & Related papers (2022-02-24T11:08:02Z) - I-Tuning: Tuning Language Models with Image for Caption Generation [9.511101155155957]
We propose a new perspective to tune the frozen PLM with images for caption generation.
We denote our method as I-Tuning, which can automatically filter the vision information from images to adjust the output hidden states of PLM.
arXiv Detail & Related papers (2022-02-14T09:36:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.