UniPELT: A Unified Framework for Parameter-Efficient Language Model
Tuning
- URL: http://arxiv.org/abs/2110.07577v1
- Date: Thu, 14 Oct 2021 17:40:08 GMT
- Title: UniPELT: A Unified Framework for Parameter-Efficient Language Model
Tuning
- Authors: Yuning Mao, Lambert Mathias, Rui Hou, Amjad Almahairi, Hao Ma, Jiawei
Han, Wen-tau Yih, Madian Khabsa
- Abstract summary: We propose a unified framework, UniPELT, which incorporates different PELT methods as submodules and learns to activate the ones that best suit the current data or task setup.
Remarkably, on the GLUE benchmark, UniPELT consistently achieves 13pt gains compared to the best individual PELT method that it incorporates and even outperforms fine-tuning under different setups.
- Score: 64.638804236566
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conventional fine-tuning of pre-trained language models tunes all model
parameters and stores a full model copy for each downstream task, which has
become increasingly infeasible as the model size grows larger. Recent
parameter-efficient language model tuning (PELT) methods manage to match the
performance of fine-tuning with much fewer trainable parameters and perform
especially well when the training data is limited. However, different PELT
methods may perform rather differently on the same task, making it nontrivial
to select the most appropriate method for a specific task, especially
considering the fast-growing number of new PELT methods and downstream tasks.
In light of model diversity and the difficulty of model selection, we propose a
unified framework, UniPELT, which incorporates different PELT methods as
submodules and learns to activate the ones that best suit the current data or
task setup. Remarkably, on the GLUE benchmark, UniPELT consistently achieves
1~3pt gains compared to the best individual PELT method that it incorporates
and even outperforms fine-tuning under different setups. Moreover, UniPELT
often surpasses the upper bound when taking the best performance of all its
submodules used individually on each task, indicating that a mixture of
multiple PELT methods may be inherently more effective than single methods.
Related papers
- Merging Multi-Task Models via Weight-Ensembling Mixture of Experts [64.94129594112557]
Merging Transformer-based models trained on different tasks into a single unified model can execute all the tasks concurrently.
Previous methods, exemplified by task arithmetic, have been proven to be both effective and scalable.
We propose to merge most of the parameters while upscaling the Transformer layers to a weight-ensembling mixture of experts (MoE) module.
arXiv Detail & Related papers (2024-02-01T08:58:57Z) - CoLLiE: Collaborative Training of Large Language Models in an Efficient
Way [59.09824823710863]
CoLLiE is an efficient library that facilitates collaborative training of large language models.
With its modular design and comprehensive functionality, CoLLiE offers a balanced blend of efficiency, ease of use, and customization.
arXiv Detail & Related papers (2023-12-01T08:02:16Z) - Prototype-based HyperAdapter for Sample-Efficient Multi-task Tuning [30.251155072822055]
Prototype-based HyperAdapter (PHA) is a novel framework built on the adapter-tuning and hypernetwork.
It introduces an instance-dense retriever and prototypical hypernetwork to generate conditional modules in a sample-efficient manner.
We show that PHA strikes a better trade-off between trainable parameters, accuracy on stream tasks, and sample efficiency.
arXiv Detail & Related papers (2023-10-18T02:42:17Z) - Parameter Efficient Multi-task Model Fusion with Partial Linearization [97.23530944186078]
We propose a novel method to improve multi-task fusion for parameter-efficient fine-tuning techniques.
Our approach partially linearizes only the adapter modules and applies task arithmetic over the linearized adapters.
We demonstrate that our partial linearization technique enables a more effective fusion of multiple tasks into a single model.
arXiv Detail & Related papers (2023-10-07T08:55:54Z) - eP-ALM: Efficient Perceptual Augmentation of Language Models [70.47962271121389]
We propose to direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception.
Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency.
We show that by freezing more than 99% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning.
arXiv Detail & Related papers (2023-03-20T19:20:34Z) - Differentiable Entailment for Parameter Efficient Few Shot Learning [0.0]
We propose a new technique for parameter efficient few shot learning.
We quantify the tradeoff between parameter efficiency and performance in the few-shot regime.
We propose a simple model agnostic approach that can be extended to any task.
arXiv Detail & Related papers (2023-01-31T00:31:11Z) - Polyhistor: Parameter-Efficient Multi-Task Adaptation for Dense Vision
Tasks [36.34331439747556]
We propose Polyhistor and Polyhistor-Lite to share information across different tasks with a few trainable parameters.
Specifically, Polyhistor achieves competitive accuracy compared to the state-of-the-art while only using 10% of their trainable parameters.
arXiv Detail & Related papers (2022-10-07T00:25:02Z) - Task Adaptive Parameter Sharing for Multi-Task Learning [114.80350786535952]
Adaptive Task Adapting Sharing (TAPS) is a method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers.
Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters.
We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
arXiv Detail & Related papers (2022-03-30T23:16:07Z) - WARP: Word-level Adversarial ReProgramming [13.08689221166729]
In many applications it is preferable to tune much smaller sets of parameters, so that the majority of parameters can be shared across multiple tasks.
We present an alternative approach based on adversarial reprogramming, which extends earlier work on automatic prompt generation.
We show that this approach outperforms other methods with a similar number of trainable parameters on SST-2 and MNLI datasets.
arXiv Detail & Related papers (2021-01-01T00:41:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.