Efficient Fine-Tuning of Compressed Language Models with Learners
- URL: http://arxiv.org/abs/2208.02070v1
- Date: Wed, 3 Aug 2022 13:42:30 GMT
- Title: Efficient Fine-Tuning of Compressed Language Models with Learners
- Authors: Danilo Vucetic, Mohammadreza Tayaranian, Maryam Ziaeefard, James J.
Clark, Brett H. Meyer, Warren J. Gross
- Abstract summary: We introduce Learner modules and priming, novel methods for fine-tuning BERT-based models.
Learner modules navigate the double bind of 1) training efficiently by fine-tuning a subset of parameters, and 2) training effectively by ensuring quick convergence and high metric scores.
Our results on DistilBERT demonstrate that learners perform on par with or surpass the baselines.
- Score: 12.768368718187428
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fine-tuning BERT-based models is resource-intensive in memory, computation,
and time. While many prior works aim to improve inference efficiency via
compression techniques, e.g., pruning, these works do not explicitly address
the computational challenges of training to downstream tasks. We introduce
Learner modules and priming, novel methods for fine-tuning that exploit the
overparameterization of pre-trained language models to gain benefits in
convergence speed and resource utilization. Learner modules navigate the double
bind of 1) training efficiently by fine-tuning a subset of parameters, and 2)
training effectively by ensuring quick convergence and high metric scores. Our
results on DistilBERT demonstrate that learners perform on par with or surpass
the baselines. Learners train 7x fewer parameters than state-of-the-art methods
on GLUE. On CoLA, learners fine-tune 20% faster, and have significantly lower
resource utilization.
Related papers
- SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios.
In the early route, intermediate outputs are consolidated via an anti-redundancy operation.
In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z) - Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior
Refinement [24.108008515395458]
We propose APE, an Adaptive Prior rEfinement method for CLIP's pre-trained knowledge, which achieves superior accuracy with high computational efficiency.
For the average accuracy over 11 benchmarks, both APE and APE-T attain state-of-the-art and respectively outperform the second-best by +1.59% and +1.99% under 16 shots with x30 less learnable parameters.
arXiv Detail & Related papers (2023-04-03T17:58:54Z) - Differentiable Entailment for Parameter Efficient Few Shot Learning [0.0]
We propose a new technique for parameter efficient few shot learning.
We quantify the tradeoff between parameter efficiency and performance in the few-shot regime.
We propose a simple model agnostic approach that can be extended to any task.
arXiv Detail & Related papers (2023-01-31T00:31:11Z) - Instance-wise Prompt Tuning for Pretrained Language Models [72.74916121511662]
Instance-wise Prompt Tuning (IPT) is the first prompt learning paradigm that injects knowledge from the input data instances to the prompts.
IPT significantly outperforms task-based prompt learning methods, and achieves comparable performance to conventional finetuning with only 0.5% - 1.5% of tuned parameters.
arXiv Detail & Related papers (2022-06-04T10:08:50Z) - Parameter-Efficient Sparsity for Large Language Models Fine-Tuning [63.321205487234074]
We propose a.
sparse-efficient Sparse Training (PST) method to reduce the number of trainable parameters during sparse-aware training.
Experiments with diverse networks (i.e., BERT, RoBERTa and GPT-2) demonstrate PST performs on par or better than previous sparsity methods.
arXiv Detail & Related papers (2022-05-23T02:43:45Z) - On the efficiency of Stochastic Quasi-Newton Methods for Deep Learning [0.0]
We study the behaviour of quasi-Newton training algorithms for deep memory networks.
We show that quasi-Newtons are efficient and able to outperform in some instances the well-known first-order Adam run.
arXiv Detail & Related papers (2022-05-18T20:53:58Z) - Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than
In-Context Learning [81.3514358542452]
Few-shot in-context learning (ICL) incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made.
parameter-efficient fine-tuning offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task.
In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs.
arXiv Detail & Related papers (2022-05-11T17:10:41Z) - Efficient Few-Shot Object Detection via Knowledge Inheritance [62.36414544915032]
Few-shot object detection (FSOD) aims at learning a generic detector that can adapt to unseen tasks with scarce training samples.
We present an efficient pretrain-transfer framework (PTF) baseline with no computational increment.
We also propose an adaptive length re-scaling (ALR) strategy to alleviate the vector length inconsistency between the predicted novel weights and the pretrained base weights.
arXiv Detail & Related papers (2022-03-23T06:24:31Z) - DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language
Models [152.29364079385635]
As pre-trained models grow bigger, the fine-tuning process can be time-consuming and computationally expensive.
We propose a framework for resource- and parameter-efficient fine-tuning by leveraging the sparsity prior in both weight updates and the final model weights.
Our proposed framework, dubbed Dually Sparsity-Embedded Efficient Tuning (DSEE), aims to achieve two key objectives: (i) parameter efficient fine-tuning and (ii) resource-efficient inference.
arXiv Detail & Related papers (2021-10-30T03:29:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.