Related papers: $\mathcal{Y}$-Tuning: An Efficient Tuning Paradigm for Large-Scale Pre-Trained Models via Label Representation Learning

$\mathcal{Y}$-Tuning: An Efficient Tuning Paradigm for Large-Scale Pre-Trained Models via Label Representation Learning

URL: http://arxiv.org/abs/2202.09817v1
Date: Sun, 20 Feb 2022 13:49:34 GMT
Title: $\mathcal{Y}$-Tuning: An Efficient Tuning Paradigm for Large-Scale Pre-Trained Models via Label Representation Learning
Authors: Yitao Liu, Chenxin An, Xipeng Qiu
Abstract summary: $mathcalY$-tuning learns dense representations for labels defined in a given task and aligns them to fixed feature representation. For $textDeBERTa_textXXL$ with 1.6 billion parameters, $mathcalY$-tuning achieves performance more than $96%$ of full fine-tuning on GLUE Benchmark.
Score: 47.742220473129684
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the success of large-scale pre-trained models (PTMs), how efficiently adapting PTMs to downstream tasks has attracted tremendous attention, especially for PTMs with billions of parameters. Although some parameter-efficient tuning paradigms have been proposed to address this problem, they still require large resources to compute the gradients in the training phase. In this paper, we propose $\mathcal{Y}$-Tuning, an efficient yet effective paradigm to adapt frozen large-scale PTMs to specific downstream tasks. $\mathcal{Y}$-tuning learns dense representations for labels $\mathcal{Y}$ defined in a given task and aligns them to fixed feature representation. Without tuning the features of input text and model parameters, $\mathcal{Y}$-tuning is both parameter-efficient and training-efficient. For $\text{DeBERTa}_\text{XXL}$ with 1.6 billion parameters, $\mathcal{Y}$-tuning achieves performance more than $96\%$ of full fine-tuning on GLUE Benchmark with only $2\%$ tunable parameters and much fewer training costs.

Related papers

Position-Aware Depth Decay Decoding ($D^3$): Boosting Large Language Model Inference Efficiency [26.173523821684306]
A token-position aware layer skipping framework is proposed to save 1.5x times operations efficiently while maintaining performance. Experiments on large language models with $7 sim 70$ billion parameters show that $D3$ can achieve an average 1.5x speedup compared with the full-inference pipeline.
arXiv Detail & Related papers (2025-03-11T15:15:54Z)
Parameter Efficient Quasi-Orthogonal Fine-Tuning via Givens Rotation [20.47507483613317]
One representative line of fine-tuning methods is Orthogonal Fine-tuning (OFT) OFT rigorously preserves the angular distances within the parameter space to preserve the pretrained knowledge. We propose quasi-Givens Orthogonal Fine-Tuning (qGOFT) to address the problems.
arXiv Detail & Related papers (2024-04-05T15:28:44Z)
Parameter-efficient is not sufficient: Exploring Parameter, Memory, and Time Efficient Adapter Tuning for Dense Predictions [9.068569788978854]
parameter-efficient transfer learning (PETL) methods have shown promising performance in adapting to downstream tasks with only a few trainable parameters. PETL methods in computer vision (CV) can be computationally expensive and require large amounts of memory and time cost during training. mathrmE3VA$ can save up to 62.2% training memory and 26.2% training time on average.
arXiv Detail & Related papers (2023-06-16T09:54:07Z)
Tune As You Scale: Hyperparameter Optimization For Compute Efficient Training [0.0]
We propose a practical method for robustly tuning large models. CarBS performs local search around the performance-cost frontier. Among our results, we effectively solve the entire ProcGen benchmark just by tuning a simple baseline.
arXiv Detail & Related papers (2023-06-13T18:22:24Z)
Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning [126.84770886628833]
Existing finetuning methods either tune all parameters of the pretrained model (full finetuning) or only tune the last linear layer (linear probing) We propose a new parameter-efficient finetuning method termed as SSF, representing that researchers only need to Scale and Shift the deep Features extracted by a pre-trained model to catch up with the performance full finetuning.
arXiv Detail & Related papers (2022-10-17T08:14:49Z)
AlphaTuning: Quantization-Aware Parameter-Efficient Adaptation of Large-Scale Pre-Trained Language Models [19.640997611256168]
We propose AlphaTuning, consisting of post-training quantization of the pre-trained language model and fine-tuning only some parts of quantized parameters for a target task. Specifically, AlphaTuning works by employing binary-coding quantization, which factorizes the full-precision parameters into binary parameters and a separate set of scaling factors. We demonstrate that AlphaTuning, when applied to GPT-2 and OPT, performs competitively with full fine-tuning on a variety of downstream tasks while achieving >10x compression ratio under 4-bit quantization and >1,000x reduction in the number of trainable parameters.
arXiv Detail & Related papers (2022-10-08T00:36:00Z)
Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning [81.3514358542452]
Few-shot in-context learning (ICL) incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made. parameter-efficient fine-tuning offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task. In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs.
arXiv Detail & Related papers (2022-05-11T17:10:41Z)
CPM-2: Large-scale Cost-effective Pre-trained Language Models [71.59893315671997]
We present a suite of cost-effective techniques for the use of PLMs to deal with the efficiency issues of pre-training, fine-tuning, and inference. We introduce knowledge inheritance to accelerate the pre-training process by exploiting existing PLMs instead of training models from scratch. We implement a new inference toolkit, namely InfMoE, for using large-scale PLMs with limited computational resources.
arXiv Detail & Related papers (2021-06-20T15:43:54Z)
Prefix-Tuning: Optimizing Continuous Prompts for Generation [85.6357778621526]
Fine-tuning is the de facto way to leverage large pretrained language models to perform downstream tasks. We propose prefix-tuning, a lightweight alternative to fine-tuning for natural language generation tasks. We find that by learning only 0.1% of the parameters, prefix-tuning obtains comparable performance in the full data setting.
arXiv Detail & Related papers (2021-01-01T08:00:36Z)
Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping [99.59319332864129]
In this paper, we study reinforcement learning for discounted Decision (MDP) We propose a novel algorithm that makes use of the feature mapping and obtains a $tilde O(dsqrtT/ (1-gamma)2)$ regret. Our upper and lower bound results together suggest that the proposed reinforcement learning algorithm is near-optimal up to a $ (1-gamma)-0.5$ factor.
arXiv Detail & Related papers (2020-06-23T17:08:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.