Related papers: Neural Architecture Search for Parameter-Efficient Fine-tuning of Large Pre-trained Language Models

Neural Architecture Search for Parameter-Efficient Fine-tuning of Large Pre-trained Language Models

URL: http://arxiv.org/abs/2305.16597v1
Date: Fri, 26 May 2023 03:01:07 GMT
Title: Neural Architecture Search for Parameter-Efficient Fine-tuning of Large Pre-trained Language Models
Authors: Neal Lawton, Anoop Kumar, Govind Thattai, Aram Galstyan, Greg Ver Steeg
Abstract summary: We propose an efficient NAS method for learning PET architectures via structured and unstructured pruning. We present experiments on GLUE demonstrating the effectiveness of our algorithm and discuss how PET architectural design choices affect performance in practice.
Score: 25.33932250843436
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Parameter-efficient tuning (PET) methods fit pre-trained language models (PLMs) to downstream tasks by either computing a small compressed update for a subset of model parameters, or appending and fine-tuning a small number of new model parameters to the pre-trained network. Hand-designed PET architectures from the literature perform well in practice, but have the potential to be improved via automated neural architecture search (NAS). We propose an efficient NAS method for learning PET architectures via structured and unstructured pruning. We present experiments on GLUE demonstrating the effectiveness of our algorithm and discuss how PET architectural design choices affect performance in practice.

Related papers

Efficient Language Modeling for Low-Resource Settings with Hybrid RNN-Transformer Architectures [8.442206285783463]
Transformer-based language models have recently been at the forefront of active research in text generation. These models' advances come at the price of prohibitive training costs, with parameter counts in the billions and compute requirements measured in petaflop/s-decades. We investigate transformer-based architectures for improving model performance in a low-data regime by selectively replacing attention layers with feed-forward and quasi-recurrent neural network layers.
arXiv Detail & Related papers (2025-02-02T01:05:09Z)
UniPET-SPK: A Unified Framework for Parameter-Efficient Tuning of Pre-trained Speech Models for Robust Speaker Verification [32.3387409534726]
This study explores parameter-efficient tuning methods for large-scale pre-trained SSL speech models to speaker verification task. We propose three PET methods: (i)an adapter-tuning method, (ii)a prompt-tuning method, and (iii)a unified framework that effectively incorporates adapter-tuning and prompt-tuning with a dynamically learnable gating mechanism. The proposed UniPET-SPK learns to find the optimal mixture of PET methods to match different datasets and scenarios.
arXiv Detail & Related papers (2025-01-27T22:26:37Z)
PETapter: Leveraging PET-style classification heads for modular few-shot parameter-efficient fine-tuning [1.0541541376305243]
Few-shot learning and parameter-efficient fine-tuning are crucial to overcome the challenges of data scarcity and ever growing language model sizes. We propose PETapter, a novel method that effectively combines PEFT methods with PET-style classification heads to boost few-shot learning capabilities. We show that PETapter achieves comparable performance to full few-shot fine-tuning using pattern-exploiting training (PET)
arXiv Detail & Related papers (2024-12-06T11:49:18Z)
HiDe-PET: Continual Learning via Hierarchical Decomposition of Parameter-Efficient Tuning [55.88910947643436]
We propose a unified framework for continual learning (CL) with pre-trained models (PTMs) and parameter-efficient tuning (PET) We present Hierarchical Decomposition PET (HiDe-PET), an innovative approach that explicitly optimize the objective through incorporating task-specific and task-shared knowledge. Our approach demonstrates remarkably superior performance over a broad spectrum of recent strong baselines.
arXiv Detail & Related papers (2024-07-07T01:50:25Z)
Mixture-of-Linguistic-Experts Adapters for Improving and Interpreting Pre-trained Language Models [22.977852629450346]
We propose a method that combines two popular research areas by injecting linguistic structures into pre-trained language models. In our approach, parallel adapter modules encoding different linguistic structures are combined using a novel Mixture-of-Linguistic-Experts architecture. Our experiment results show that our approach can outperform state-of-the-art PEFT methods with a comparable number of parameters.
arXiv Detail & Related papers (2023-10-24T23:29:06Z)
Exploring the Impact of Model Scaling on Parameter-Efficient Tuning [100.61202305296275]
Scaling-efficient tuning (PET) methods can effectively drive extremely large pre-trained language models (PLMs) In small PLMs, there are usually noticeable performance differences among PET methods. We introduce a more flexible PET method called Arbitrary PET (APET) method.
arXiv Detail & Related papers (2023-06-04T10:10:54Z)
Scaling Pre-trained Language Models to Deeper via Parameter-efficient Architecture [68.13678918660872]
We design a more capable parameter-sharing architecture based on matrix product operator (MPO) MPO decomposition can reorganize and factorize the information of a parameter matrix into two parts. Our architecture shares the central tensor across all layers for reducing the model size.
arXiv Detail & Related papers (2023-03-27T02:34:09Z)
Proxyless Neural Architecture Adaptation for Supervised Learning and Self-Supervised Learning [3.766702945560518]
We propose proxyless neural architecture adaptation that is reproducible and efficient. Our method can be applied to both supervised learning and self-supervised learning.
arXiv Detail & Related papers (2022-05-15T02:49:48Z)
Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning [81.3514358542452]
Few-shot in-context learning (ICL) incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made. parameter-efficient fine-tuning offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task. In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs.
arXiv Detail & Related papers (2022-05-11T17:10:41Z)
Learning Features with Parameter-Free Layers [22.92568642331809]
This paper argues that simple built-in parameter-free operations can be a favorable alternative to the efficient trainable layers in a network architecture. Experiments on the ImageNet dataset demonstrate that the network architectures with parameter-free operations could enjoy the advantages of further efficiency in terms of model speed, the number of the parameters, and FLOPs.
arXiv Detail & Related papers (2022-02-06T14:03:36Z)
MoEfication: Conditional Computation of Transformer Models for Efficient Inference [66.56994436947441]
Transformer-based pre-trained language models can achieve superior performance on most NLP tasks due to large parameter capacity, but also lead to huge computation cost. We explore to accelerate large-model inference by conditional computation based on the sparse activation phenomenon. We propose to transform a large model into its mixture-of-experts (MoE) version with equal model size, namely MoEfication.
arXiv Detail & Related papers (2021-10-05T02:14:38Z)
AutoTinyBERT: Automatic Hyper-parameter Optimization for Efficient Pre-trained Language Models [46.69439585453071]
We adopt the one-shot Neural Architecture Search (NAS) to automatically search architecture hyper- parameters. Specifically, we design the techniques of one-shot learning and the search space to provide an adaptive and efficient development way of tiny PLMs. We name our method AutoTinyBERT and evaluate its effectiveness on the GLUE and SQuAD benchmarks.
arXiv Detail & Related papers (2021-07-29T00:47:30Z)
Parameter-Efficient Transfer from Sequential Behaviors for User Modeling and Recommendation [111.44445634272235]
In this paper, we develop a parameter efficient transfer learning architecture, termed as PeterRec. PeterRec allows the pre-trained parameters to remain unaltered during fine-tuning by injecting a series of re-learned neural networks. We perform extensive experimental ablation to show the effectiveness of the learned user representation in five downstream tasks.
arXiv Detail & Related papers (2020-01-13T14:09:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.