$\mathcal{Y}$-Tuning: An Efficient Tuning Paradigm for Large-Scale
Pre-Trained Models via Label Representation Learning
- URL: http://arxiv.org/abs/2202.09817v1
- Date: Sun, 20 Feb 2022 13:49:34 GMT
- Title: $\mathcal{Y}$-Tuning: An Efficient Tuning Paradigm for Large-Scale
Pre-Trained Models via Label Representation Learning
- Authors: Yitao Liu, Chenxin An, Xipeng Qiu
- Abstract summary: $mathcalY$-tuning learns dense representations for labels defined in a given task and aligns them to fixed feature representation.
For $textDeBERTa_textXXL$ with 1.6 billion parameters, $mathcalY$-tuning achieves performance more than $96%$ of full fine-tuning on GLUE Benchmark.
- Score: 47.742220473129684
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the success of large-scale pre-trained models (PTMs), how efficiently
adapting PTMs to downstream tasks has attracted tremendous attention,
especially for PTMs with billions of parameters. Although some
parameter-efficient tuning paradigms have been proposed to address this
problem, they still require large resources to compute the gradients in the
training phase. In this paper, we propose $\mathcal{Y}$-Tuning, an efficient
yet effective paradigm to adapt frozen large-scale PTMs to specific downstream
tasks. $\mathcal{Y}$-tuning learns dense representations for labels
$\mathcal{Y}$ defined in a given task and aligns them to fixed feature
representation. Without tuning the features of input text and model parameters,
$\mathcal{Y}$-tuning is both parameter-efficient and training-efficient. For
$\text{DeBERTa}_\text{XXL}$ with 1.6 billion parameters, $\mathcal{Y}$-tuning
achieves performance more than $96\%$ of full fine-tuning on GLUE Benchmark
with only $2\%$ tunable parameters and much fewer training costs.
Related papers
- Parameter Efficient Quasi-Orthogonal Fine-Tuning via Givens Rotation [20.47507483613317]
One representative line of fine-tuning methods is Orthogonal Fine-tuning (OFT)
OFT rigorously preserves the angular distances within the parameter space to preserve the pretrained knowledge.
We propose quasi-Givens Orthogonal Fine-Tuning (qGOFT) to address the problems.
arXiv Detail & Related papers (2024-04-05T15:28:44Z) - Parameter-efficient is not sufficient: Exploring Parameter, Memory, and
Time Efficient Adapter Tuning for Dense Predictions [9.068569788978854]
parameter-efficient transfer learning (PETL) methods have shown promising performance in adapting to downstream tasks with only a few trainable parameters.
PETL methods in computer vision (CV) can be computationally expensive and require large amounts of memory and time cost during training.
mathrmE3VA$ can save up to 62.2% training memory and 26.2% training time on average.
arXiv Detail & Related papers (2023-06-16T09:54:07Z) - Tune As You Scale: Hyperparameter Optimization For Compute Efficient
Training [0.0]
We propose a practical method for robustly tuning large models.
CarBS performs local search around the performance-cost frontier.
Among our results, we effectively solve the entire ProcGen benchmark just by tuning a simple baseline.
arXiv Detail & Related papers (2023-06-13T18:22:24Z) - Scaling & Shifting Your Features: A New Baseline for Efficient Model
Tuning [126.84770886628833]
Existing finetuning methods either tune all parameters of the pretrained model (full finetuning) or only tune the last linear layer (linear probing)
We propose a new parameter-efficient finetuning method termed as SSF, representing that researchers only need to Scale and Shift the deep Features extracted by a pre-trained model to catch up with the performance full finetuning.
arXiv Detail & Related papers (2022-10-17T08:14:49Z) - AlphaTuning: Quantization-Aware Parameter-Efficient Adaptation of
Large-Scale Pre-Trained Language Models [19.640997611256168]
We propose AlphaTuning, consisting of post-training quantization of the pre-trained language model and fine-tuning only some parts of quantized parameters for a target task.
Specifically, AlphaTuning works by employing binary-coding quantization, which factorizes the full-precision parameters into binary parameters and a separate set of scaling factors.
We demonstrate that AlphaTuning, when applied to GPT-2 and OPT, performs competitively with full fine-tuning on a variety of downstream tasks while achieving >10x compression ratio under 4-bit quantization and >1,000x reduction in the number of trainable parameters.
arXiv Detail & Related papers (2022-10-08T00:36:00Z) - Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than
In-Context Learning [81.3514358542452]
Few-shot in-context learning (ICL) incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made.
parameter-efficient fine-tuning offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task.
In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs.
arXiv Detail & Related papers (2022-05-11T17:10:41Z) - CPM-2: Large-scale Cost-effective Pre-trained Language Models [71.59893315671997]
We present a suite of cost-effective techniques for the use of PLMs to deal with the efficiency issues of pre-training, fine-tuning, and inference.
We introduce knowledge inheritance to accelerate the pre-training process by exploiting existing PLMs instead of training models from scratch.
We implement a new inference toolkit, namely InfMoE, for using large-scale PLMs with limited computational resources.
arXiv Detail & Related papers (2021-06-20T15:43:54Z) - Prefix-Tuning: Optimizing Continuous Prompts for Generation [85.6357778621526]
Fine-tuning is the de facto way to leverage large pretrained language models to perform downstream tasks.
We propose prefix-tuning, a lightweight alternative to fine-tuning for natural language generation tasks.
We find that by learning only 0.1% of the parameters, prefix-tuning obtains comparable performance in the full data setting.
arXiv Detail & Related papers (2021-01-01T08:00:36Z) - Provably Efficient Reinforcement Learning for Discounted MDPs with
Feature Mapping [99.59319332864129]
In this paper, we study reinforcement learning for discounted Decision (MDP)
We propose a novel algorithm that makes use of the feature mapping and obtains a $tilde O(dsqrtT/ (1-gamma)2)$ regret.
Our upper and lower bound results together suggest that the proposed reinforcement learning algorithm is near-optimal up to a $ (1-gamma)-0.5$ factor.
arXiv Detail & Related papers (2020-06-23T17:08:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.