Sensi-BERT: Towards Sensitivity Driven Fine-Tuning for
Parameter-Efficient BERT
- URL: http://arxiv.org/abs/2307.11764v2
- Date: Thu, 31 Aug 2023 17:09:23 GMT
- Title: Sensi-BERT: Towards Sensitivity Driven Fine-Tuning for
Parameter-Efficient BERT
- Authors: Souvik Kundu, Sharath Nittur Sridhar, Maciej Szankin, Sairam
Sundaresan
- Abstract summary: We present Sensi-BERT, a sensitivity driven efficient fine-tuning of BERT models for downstream tasks.
Our experiments show the efficacy of Sensi-BERT across different downstream tasks including MNLI, QQP, QNLI, SST-2 and SQuAD.
- Score: 6.029590006321152
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large pre-trained language models have recently gained significant traction
due to their improved performance on various down-stream tasks like text
classification and question answering, requiring only few epochs of
fine-tuning. However, their large model sizes often prohibit their applications
on resource-constrained edge devices. Existing solutions of yielding
parameter-efficient BERT models largely rely on compute-exhaustive training and
fine-tuning. Moreover, they often rely on additional compute heavy models to
mitigate the performance gap. In this paper, we present Sensi-BERT, a
sensitivity driven efficient fine-tuning of BERT models that can take an
off-the-shelf pre-trained BERT model and yield highly parameter-efficient
models for downstream tasks. In particular, we perform sensitivity analysis to
rank each individual parameter tensor, that then is used to trim them
accordingly during fine-tuning for a given parameter or FLOPs budget. Our
experiments show the efficacy of Sensi-BERT across different downstream tasks
including MNLI, QQP, QNLI, SST-2 and SQuAD, showing better performance at
similar or smaller parameter budget compared to various alternatives.
Related papers
- LoRTA: Low Rank Tensor Adaptation of Large Language Models [70.32218116940393]
Low Rank Adaptation (LoRA) is a popular Efficient Fine Tuning (PEFT) method that effectively adapts large pre-trained models for downstream tasks.
We propose a novel approach that employs a low rank tensor parametrization for model updates.
Our method is both efficient and effective for fine-tuning large language models, achieving a substantial reduction in the number of parameters while maintaining comparable performance.
arXiv Detail & Related papers (2024-10-05T06:59:50Z) - ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections [59.839926875976225]
We propose the ETHER transformation family, which performs Efficient fineTuning via HypErplane Reflections.
In particular, we introduce ETHER and its relaxation ETHER+, which match or outperform existing PEFT methods with significantly fewer parameters.
arXiv Detail & Related papers (2024-05-30T17:26:02Z) - Uncertainty-aware Parameter-Efficient Self-training for Semi-supervised
Language Understanding [38.11411155621616]
We study self-training as one of the predominant semi-supervised learning approaches.
We present UPET, a novel Uncertainty-aware self-Training framework.
We show that UPET achieves a substantial improvement in terms of performance and efficiency.
arXiv Detail & Related papers (2023-10-19T02:18:29Z) - E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning [55.50908600818483]
Fine-tuning large-scale pretrained vision models for new tasks has become increasingly parameter-intensive.
We propose an Effective and Efficient Visual Prompt Tuning (E2VPT) approach for large-scale transformer-based model adaptation.
Our approach outperforms several state-of-the-art baselines on two benchmarks.
arXiv Detail & Related papers (2023-07-25T19:03:21Z) - PATS: Sensitivity-aware Noisy Learning for Pretrained Language Models [29.140036130469042]
We present PATS (Perturbation According To Sensitivity), a noisy training mechanism which considers each parameter's importance in the downstream task.
Experiments conducted on different tasks of the GLUE benchmark show PATS can consistently empower the fine-tuning of different sizes of PLMs.
arXiv Detail & Related papers (2022-10-22T10:05:14Z) - Scaling & Shifting Your Features: A New Baseline for Efficient Model
Tuning [126.84770886628833]
Existing finetuning methods either tune all parameters of the pretrained model (full finetuning) or only tune the last linear layer (linear probing)
We propose a new parameter-efficient finetuning method termed as SSF, representing that researchers only need to Scale and Shift the deep Features extracted by a pre-trained model to catch up with the performance full finetuning.
arXiv Detail & Related papers (2022-10-17T08:14:49Z) - DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language
Models [152.29364079385635]
As pre-trained models grow bigger, the fine-tuning process can be time-consuming and computationally expensive.
We propose a framework for resource- and parameter-efficient fine-tuning by leveraging the sparsity prior in both weight updates and the final model weights.
Our proposed framework, dubbed Dually Sparsity-Embedded Efficient Tuning (DSEE), aims to achieve two key objectives: (i) parameter efficient fine-tuning and (ii) resource-efficient inference.
arXiv Detail & Related papers (2021-10-30T03:29:47Z) - MoEfication: Conditional Computation of Transformer Models for Efficient
Inference [66.56994436947441]
Transformer-based pre-trained language models can achieve superior performance on most NLP tasks due to large parameter capacity, but also lead to huge computation cost.
We explore to accelerate large-model inference by conditional computation based on the sparse activation phenomenon.
We propose to transform a large model into its mixture-of-experts (MoE) version with equal model size, namely MoEfication.
arXiv Detail & Related papers (2021-10-05T02:14:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.