Neural Architecture Search for Parameter-Efficient Fine-tuning of Large
Pre-trained Language Models
- URL: http://arxiv.org/abs/2305.16597v1
- Date: Fri, 26 May 2023 03:01:07 GMT
- Title: Neural Architecture Search for Parameter-Efficient Fine-tuning of Large
Pre-trained Language Models
- Authors: Neal Lawton, Anoop Kumar, Govind Thattai, Aram Galstyan, Greg Ver
Steeg
- Abstract summary: We propose an efficient NAS method for learning PET architectures via structured and unstructured pruning.
We present experiments on GLUE demonstrating the effectiveness of our algorithm and discuss how PET architectural design choices affect performance in practice.
- Score: 25.33932250843436
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Parameter-efficient tuning (PET) methods fit pre-trained language models
(PLMs) to downstream tasks by either computing a small compressed update for a
subset of model parameters, or appending and fine-tuning a small number of new
model parameters to the pre-trained network. Hand-designed PET architectures
from the literature perform well in practice, but have the potential to be
improved via automated neural architecture search (NAS). We propose an
efficient NAS method for learning PET architectures via structured and
unstructured pruning. We present experiments on GLUE demonstrating the
effectiveness of our algorithm and discuss how PET architectural design choices
affect performance in practice.
Related papers
- HiDe-PET: Continual Learning via Hierarchical Decomposition of Parameter-Efficient Tuning [55.88910947643436]
We propose a unified framework for continual learning (CL) with pre-trained models (PTMs) and parameter-efficient tuning (PET)
We present Hierarchical Decomposition PET (HiDe-PET), an innovative approach that explicitly optimize the objective through incorporating task-specific and task-shared knowledge.
Our approach demonstrates remarkably superior performance over a broad spectrum of recent strong baselines.
arXiv Detail & Related papers (2024-07-07T01:50:25Z) - Mixture-of-Linguistic-Experts Adapters for Improving and Interpreting
Pre-trained Language Models [22.977852629450346]
We propose a method that combines two popular research areas by injecting linguistic structures into pre-trained language models.
In our approach, parallel adapter modules encoding different linguistic structures are combined using a novel Mixture-of-Linguistic-Experts architecture.
Our experiment results show that our approach can outperform state-of-the-art PEFT methods with a comparable number of parameters.
arXiv Detail & Related papers (2023-10-24T23:29:06Z) - Exploring the Impact of Model Scaling on Parameter-Efficient Tuning [100.61202305296275]
Scaling-efficient tuning (PET) methods can effectively drive extremely large pre-trained language models (PLMs)
In small PLMs, there are usually noticeable performance differences among PET methods.
We introduce a more flexible PET method called Arbitrary PET (APET) method.
arXiv Detail & Related papers (2023-06-04T10:10:54Z) - Scaling Pre-trained Language Models to Deeper via Parameter-efficient
Architecture [68.13678918660872]
We design a more capable parameter-sharing architecture based on matrix product operator (MPO)
MPO decomposition can reorganize and factorize the information of a parameter matrix into two parts.
Our architecture shares the central tensor across all layers for reducing the model size.
arXiv Detail & Related papers (2023-03-27T02:34:09Z) - Proxyless Neural Architecture Adaptation for Supervised Learning and
Self-Supervised Learning [3.766702945560518]
We propose proxyless neural architecture adaptation that is reproducible and efficient.
Our method can be applied to both supervised learning and self-supervised learning.
arXiv Detail & Related papers (2022-05-15T02:49:48Z) - Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than
In-Context Learning [81.3514358542452]
Few-shot in-context learning (ICL) incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made.
parameter-efficient fine-tuning offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task.
In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs.
arXiv Detail & Related papers (2022-05-11T17:10:41Z) - Learning Features with Parameter-Free Layers [22.92568642331809]
This paper argues that simple built-in parameter-free operations can be a favorable alternative to the efficient trainable layers in a network architecture.
Experiments on the ImageNet dataset demonstrate that the network architectures with parameter-free operations could enjoy the advantages of further efficiency in terms of model speed, the number of the parameters, and FLOPs.
arXiv Detail & Related papers (2022-02-06T14:03:36Z) - MoEfication: Conditional Computation of Transformer Models for Efficient
Inference [66.56994436947441]
Transformer-based pre-trained language models can achieve superior performance on most NLP tasks due to large parameter capacity, but also lead to huge computation cost.
We explore to accelerate large-model inference by conditional computation based on the sparse activation phenomenon.
We propose to transform a large model into its mixture-of-experts (MoE) version with equal model size, namely MoEfication.
arXiv Detail & Related papers (2021-10-05T02:14:38Z) - AutoTinyBERT: Automatic Hyper-parameter Optimization for Efficient
Pre-trained Language Models [46.69439585453071]
We adopt the one-shot Neural Architecture Search (NAS) to automatically search architecture hyper- parameters.
Specifically, we design the techniques of one-shot learning and the search space to provide an adaptive and efficient development way of tiny PLMs.
We name our method AutoTinyBERT and evaluate its effectiveness on the GLUE and SQuAD benchmarks.
arXiv Detail & Related papers (2021-07-29T00:47:30Z) - Parameter-Efficient Transfer from Sequential Behaviors for User Modeling
and Recommendation [111.44445634272235]
In this paper, we develop a parameter efficient transfer learning architecture, termed as PeterRec.
PeterRec allows the pre-trained parameters to remain unaltered during fine-tuning by injecting a series of re-learned neural networks.
We perform extensive experimental ablation to show the effectiveness of the learned user representation in five downstream tasks.
arXiv Detail & Related papers (2020-01-13T14:09:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.