Related papers: Efficient Fine-Tuning of Compressed Language Models with Learners

Efficient Fine-Tuning of Compressed Language Models with Learners

URL: http://arxiv.org/abs/2208.02070v1
Date: Wed, 3 Aug 2022 13:42:30 GMT
Title: Efficient Fine-Tuning of Compressed Language Models with Learners
Authors: Danilo Vucetic, Mohammadreza Tayaranian, Maryam Ziaeefard, James J. Clark, Brett H. Meyer, Warren J. Gross
Abstract summary: We introduce Learner modules and priming, novel methods for fine-tuning BERT-based models. Learner modules navigate the double bind of 1) training efficiently by fine-tuning a subset of parameters, and 2) training effectively by ensuring quick convergence and high metric scores. Our results on DistilBERT demonstrate that learners perform on par with or surpass the baselines.
Score: 12.768368718187428
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Fine-tuning BERT-based models is resource-intensive in memory, computation, and time. While many prior works aim to improve inference efficiency via compression techniques, e.g., pruning, these works do not explicitly address the computational challenges of training to downstream tasks. We introduce Learner modules and priming, novel methods for fine-tuning that exploit the overparameterization of pre-trained language models to gain benefits in convergence speed and resource utilization. Learner modules navigate the double bind of 1) training efficiently by fine-tuning a subset of parameters, and 2) training effectively by ensuring quick convergence and high metric scores. Our results on DistilBERT demonstrate that learners perform on par with or surpass the baselines. Learners train 7x fewer parameters than state-of-the-art methods on GLUE. On CoLA, learners fine-tune 20% faster, and have significantly lower resource utilization.

Related papers

Scalable Parameter and Memory Efficient Pretraining for LLM: Recent Algorithmic Advances and Benchmarking [26.80161478380058]
Large language models (LLMs) have grown at an unprecedented rate, with some recent models containing trillions of parameters.<n>This growth is accompanied by substantial computational challenges, particularly regarding the memory and compute resources required for training and fine-tuning.<n>Motivated by this issue, we aim to address the following questions: Can parameter- or memory-efficient methods enhance pre-training efficiency while achieving performance comparable to full-model training?
arXiv Detail & Related papers (2025-05-28T22:51:43Z)
FDBPL: Faster Distillation-Based Prompt Learning for Region-Aware Vision-Language Models Adaptation [17.51747913191231]
We propose large textbfFaster large textbfDistillation-large textbfBased large textbfPrompt large textbfLL (textbfFDBPL)<n>It addresses issues by sharing soft supervision contexts across multiple training stages and implementing accelerated I/O. Comprehensive evaluations across 11 datasets demonstrate superior performance in base-to-new generalization, cross-dataset transfer, and robustness tests, achieving $2.2times$ faster training speed.
arXiv Detail & Related papers (2025-05-23T15:57:16Z)
Bilevel ZOFO: Bridging Parameter-Efficient and Zeroth-Order Techniques for Efficient LLM Fine-Tuning and Meta-Training [44.48966200270378]
Fine-tuning pre-trained Large Language Models (LLMs) for downstream tasks using First-Order (FO)imats presents significant computational challenges. We propose a bilevel optimization framework that complements ZO methods with PEFT to mitigate sensitivity to hard prompts. Our Bilevel ZOFO method employs a double-loop optimization strategy, where only the gradient of the PEFT model and the forward pass of the base model are required.
arXiv Detail & Related papers (2025-02-05T20:47:44Z)
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios. In the early route, intermediate outputs are consolidated via an anti-redundancy operation. In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z)
Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior Refinement [24.108008515395458]
We propose APE, an Adaptive Prior rEfinement method for CLIP's pre-trained knowledge, which achieves superior accuracy with high computational efficiency. For the average accuracy over 11 benchmarks, both APE and APE-T attain state-of-the-art and respectively outperform the second-best by +1.59% and +1.99% under 16 shots with x30 less learnable parameters.
arXiv Detail & Related papers (2023-04-03T17:58:54Z)
Differentiable Entailment for Parameter Efficient Few Shot Learning [0.0]
We propose a new technique for parameter efficient few shot learning. We quantify the tradeoff between parameter efficiency and performance in the few-shot regime. We propose a simple model agnostic approach that can be extended to any task.
arXiv Detail & Related papers (2023-01-31T00:31:11Z)
Instance-wise Prompt Tuning for Pretrained Language Models [72.74916121511662]
Instance-wise Prompt Tuning (IPT) is the first prompt learning paradigm that injects knowledge from the input data instances to the prompts. IPT significantly outperforms task-based prompt learning methods, and achieves comparable performance to conventional finetuning with only 0.5% - 1.5% of tuned parameters.
arXiv Detail & Related papers (2022-06-04T10:08:50Z)
Parameter-Efficient Sparsity for Large Language Models Fine-Tuning [63.321205487234074]
We propose a. sparse-efficient Sparse Training (PST) method to reduce the number of trainable parameters during sparse-aware training. Experiments with diverse networks (i.e., BERT, RoBERTa and GPT-2) demonstrate PST performs on par or better than previous sparsity methods.
arXiv Detail & Related papers (2022-05-23T02:43:45Z)
On the efficiency of Stochastic Quasi-Newton Methods for Deep Learning [0.0]
We study the behaviour of quasi-Newton training algorithms for deep memory networks. We show that quasi-Newtons are efficient and able to outperform in some instances the well-known first-order Adam run.
arXiv Detail & Related papers (2022-05-18T20:53:58Z)
Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning [81.3514358542452]
Few-shot in-context learning (ICL) incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made. parameter-efficient fine-tuning offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task. In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs.
arXiv Detail & Related papers (2022-05-11T17:10:41Z)
Efficient Few-Shot Object Detection via Knowledge Inheritance [62.36414544915032]
Few-shot object detection (FSOD) aims at learning a generic detector that can adapt to unseen tasks with scarce training samples. We present an efficient pretrain-transfer framework (PTF) baseline with no computational increment. We also propose an adaptive length re-scaling (ALR) strategy to alleviate the vector length inconsistency between the predicted novel weights and the pretrained base weights.
arXiv Detail & Related papers (2022-03-23T06:24:31Z)
DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models [152.29364079385635]
As pre-trained models grow bigger, the fine-tuning process can be time-consuming and computationally expensive. We propose a framework for resource- and parameter-efficient fine-tuning by leveraging the sparsity prior in both weight updates and the final model weights. Our proposed framework, dubbed Dually Sparsity-Embedded Efficient Tuning (DSEE), aims to achieve two key objectives: (i) parameter efficient fine-tuning and (ii) resource-efficient inference.
arXiv Detail & Related papers (2021-10-30T03:29:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.