When does Parameter-Efficient Transfer Learning Work for Machine
Translation?
- URL: http://arxiv.org/abs/2205.11277v1
- Date: Mon, 23 May 2022 12:49:46 GMT
- Title: When does Parameter-Efficient Transfer Learning Work for Machine
Translation?
- Authors: Ahmet \"Ust\"un, Asa Cooper Stickland
- Abstract summary: Prior work indicates that PEFTs may not work as well for machine translation (MT)
We conduct a comprehensive empirical study of PEFTs for MT, considering (1) various parameter budgets, (2) a diverse set of language-pairs, and (3) different pre-trained models.
We find that using PEFTs with a larger pre-trained model outperforms full fine-tuning with a smaller model, and for smaller training data sizes, PEFTs outperform full fine-tuning for the same pre-trained model.
- Score: 8.862707047517913
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Parameter-efficient fine-tuning methods (PEFTs) offer the promise of adapting
large pre-trained models while only tuning a small number of parameters. They
have been shown to be competitive with full model fine-tuning for many
downstream tasks. However, prior work indicates that PEFTs may not work as well
for machine translation (MT), and there is no comprehensive study showing when
PEFTs work for MT. We conduct a comprehensive empirical study of PEFTs for MT,
considering (1) various parameter budgets, (2) a diverse set of language-pairs,
and (3) different pre-trained models. We find that 'adapters', in which small
feed-forward networks are added after every layer, are indeed on par with full
model fine-tuning when the parameter budget corresponds to 10% of total model
parameters. Nevertheless, as the number of tuned parameters decreases, the
performance of PEFTs decreases. The magnitude of this decrease depends on the
language pair, with PEFTs particularly struggling for distantly related
language-pairs. We find that using PEFTs with a larger pre-trained model
outperforms full fine-tuning with a smaller model, and for smaller training
data sizes, PEFTs outperform full fine-tuning for the same pre-trained model.
Related papers
- Preserving Pre-trained Representation Space: On Effectiveness of Prefix-tuning for Large Multi-modal Models [24.62337386603331]
Large Multi-modal Models (LMMs) are revolutionizing the way machines interact with the world.
To adapt LMMs for downstream tasks, parameter-efficient fine-tuning (PEFT) has gained popularity.
This paper focuses on the strengths and weaknesses of each tuning strategy, shifting the focus from the efficiency typically associated with these approaches.
arXiv Detail & Related papers (2024-10-29T07:55:50Z) - An Empirical Study of Parameter Efficient Fine-tuning on Vision-Language Pre-train Model [33.853380101736306]
A natural expectation for PEFTs is that the performance of various PEFTs is positively related to the data size and fine-tunable parameter size.
We find that such an intuition holds only if the downstream data and task are not consistent with pre-training.
For downstream fine-tuning consistent with pre-training, data size no longer affects the performance, while the influence of fine-tunable parameter size is not monotonous.
arXiv Detail & Related papers (2024-03-13T11:33:38Z) - Context-PEFT: Efficient Multi-Modal, Multi-Task Fine-Tuning [12.648711621637663]
This paper introduces a novel.
COCO-Efficient Fine-Tuning (PEFT) framework for multi-modal, multi-task transfer learning with pre-trained language models.
We propose Context-PEFT, which learns different groups of adaptor parameters based on the token's domain.
Our method is evaluated on the captioning task, where it outperforms full fine-tuning under similar data constraints.
arXiv Detail & Related papers (2023-12-14T13:00:24Z) - Non-Intrusive Adaptation: Input-Centric Parameter-efficient Fine-Tuning
for Versatile Multimodal Modeling [42.42235704360381]
Large language models (LLMs) and vision language models (VLMs) demonstrate excellent performance on a wide range of tasks.
These large scales make it impossible to adapt and deploy fully specialized models given a task of interest.
In this work, we describe AdaLink as a non-intrusive PEFT technique that achieves competitive performance.
arXiv Detail & Related papers (2023-10-18T16:43:08Z) - DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning [14.975436239088312]
We propose DePT, which decomposes the soft prompt into a shorter soft prompt and a pair of low-rank matrices that are then optimised with two different learning rates.
We demonstrate that DePT outperforms state-of-the-art PEFT approaches, including the full fine-tuning baseline, in some scenarios.
arXiv Detail & Related papers (2023-09-11T00:02:05Z) - Scaling & Shifting Your Features: A New Baseline for Efficient Model
Tuning [126.84770886628833]
Existing finetuning methods either tune all parameters of the pretrained model (full finetuning) or only tune the last linear layer (linear probing)
We propose a new parameter-efficient finetuning method termed as SSF, representing that researchers only need to Scale and Shift the deep Features extracted by a pre-trained model to catch up with the performance full finetuning.
arXiv Detail & Related papers (2022-10-17T08:14:49Z) - Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than
In-Context Learning [81.3514358542452]
Few-shot in-context learning (ICL) incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made.
parameter-efficient fine-tuning offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task.
In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs.
arXiv Detail & Related papers (2022-05-11T17:10:41Z) - DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language
Models [152.29364079385635]
As pre-trained models grow bigger, the fine-tuning process can be time-consuming and computationally expensive.
We propose a framework for resource- and parameter-efficient fine-tuning by leveraging the sparsity prior in both weight updates and the final model weights.
Our proposed framework, dubbed Dually Sparsity-Embedded Efficient Tuning (DSEE), aims to achieve two key objectives: (i) parameter efficient fine-tuning and (ii) resource-efficient inference.
arXiv Detail & Related papers (2021-10-30T03:29:47Z) - MoEfication: Conditional Computation of Transformer Models for Efficient
Inference [66.56994436947441]
Transformer-based pre-trained language models can achieve superior performance on most NLP tasks due to large parameter capacity, but also lead to huge computation cost.
We explore to accelerate large-model inference by conditional computation based on the sparse activation phenomenon.
We propose to transform a large model into its mixture-of-experts (MoE) version with equal model size, namely MoEfication.
arXiv Detail & Related papers (2021-10-05T02:14:38Z) - CPM-2: Large-scale Cost-effective Pre-trained Language Models [71.59893315671997]
We present a suite of cost-effective techniques for the use of PLMs to deal with the efficiency issues of pre-training, fine-tuning, and inference.
We introduce knowledge inheritance to accelerate the pre-training process by exploiting existing PLMs instead of training models from scratch.
We implement a new inference toolkit, namely InfMoE, for using large-scale PLMs with limited computational resources.
arXiv Detail & Related papers (2021-06-20T15:43:54Z) - LoRA: Low-Rank Adaptation of Large Language Models [71.75808607987281]
Low-Rank Adaptation, or LoRA, freezes the pre-trained model weights and injects trainable rank decomposition into each layer of the Transformer architecture.
For GPT-3, LoRA can reduce the number of trainable parameters by 10,000 times and the computation hardware requirement by 3 times compared to full fine-tuning.
arXiv Detail & Related papers (2021-06-17T17:37:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.