Multilingual Speech Translation with Efficient Finetuning of Pretrained
Models
- URL: http://arxiv.org/abs/2010.12829v4
- Date: Sat, 2 Jan 2021 08:16:21 GMT
- Title: Multilingual Speech Translation with Efficient Finetuning of Pretrained
Models
- Authors: Xian Li, Changhan Wang, Yun Tang, Chau Tran, Yuqing Tang, Juan Pino,
Alexei Baevski, Alexis Conneau, Michael Auli
- Abstract summary: A minimalistic LNA (LayerNorm and Attention) finetuning can achieve zero-shot crosslingual and cross-modality transfer ability.
Our approach demonstrates strong zero-shot performance in a many-to-many multilingual model.
- Score: 82.22294901727933
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a simple yet effective approach to build multilingual
speech-to-text (ST) translation by efficient transfer learning from pretrained
speech encoder and text decoder. Our key finding is that a minimalistic LNA
(LayerNorm and Attention) finetuning can achieve zero-shot crosslingual and
cross-modality transfer ability by only finetuning less than 10% of the
pretrained parameters. This enables effectively leveraging large pretrained
models with low training cost. Using wav2vec 2.0 for acoustic modeling, and
mBART for multilingual text generation, our approach advanced the new
state-of-the-art for 34 translation directions (and surpassing cascaded ST for
23 of them) on large-scale multilingual ST benchmark CoVoST 2 (+6.4 BLEU on
average across 15 En-X directions and +5.1 BLEU on average across 19 X-En
directions). Our approach demonstrates strong zero-shot performance in a
many-to-many multilingual model (+5.7 BLEU on average across 18 non-English
directions), making it an appealing approach for attaining high-quality speech
translation with improved parameter and data efficiency.
Related papers
- Improved Cross-Lingual Transfer Learning For Automatic Speech
Translation [18.97234151624098]
We show that by initializing the encoder of the encoder-decoder sequence-to-sequence translation model with SAMU-XLS-R, we achieve significantly better cross-lingual task knowledge transfer.
We demonstrate the effectiveness of our approach on two popular datasets, namely, CoVoST-2 and Europarl.
arXiv Detail & Related papers (2023-06-01T15:19:06Z) - Prompt-Tuning Can Be Much Better Than Fine-Tuning on Cross-lingual
Understanding With Multilingual Language Models [95.32691891392903]
In this paper, we do cross-lingual evaluation on various NLU tasks using prompt-tuning and compare it with fine-tuning.
The results show that prompt tuning achieves much better cross-lingual transfer than fine-tuning across datasets.
arXiv Detail & Related papers (2022-10-22T05:48:02Z) - Improving Neural Machine Translation by Denoising Training [95.96569884410137]
We present a simple and effective pretraining strategy Denoising Training DoT for neural machine translation.
We update the model parameters with source- and target-side denoising tasks at the early stage and then tune the model normally.
Experiments show DoT consistently improves the neural machine translation performance across 12 bilingual and 16 multilingual directions.
arXiv Detail & Related papers (2022-01-19T00:11:38Z) - XLS-R: Self-supervised Cross-lingual Speech Representation Learning at
Scale [48.0390317915984]
XLS-R is a large-scale model for cross-lingual speech representation learning based on wav2vec 2.0.
We train models with up to 2B parameters on nearly half a million hours of publicly available speech audio in 128 languages.
arXiv Detail & Related papers (2021-11-17T18:49:42Z) - Improving Neural Machine Translation by Bidirectional Training [85.64797317290349]
We present a simple and effective pretraining strategy -- bidirectional training (BiT) for neural machine translation.
Specifically, we bidirectionally update the model parameters at the early stage and then tune the model normally.
Experimental results show that BiT pushes the SOTA neural machine translation performance across 15 translation tasks on 8 language pairs significantly higher.
arXiv Detail & Related papers (2021-09-16T07:58:33Z) - Recipes for Adapting Pre-trained Monolingual and Multilingual Models to
Machine Translation [50.0258495437314]
We investigate the benefits and drawbacks of freezing parameters, and adding new ones, when fine-tuning a pre-trained model on Machine Translation (MT)
For BART we get the best performance by freezing most of the model parameters, and adding extra positional embeddings.
For mBART we match or outperform the performance of naive fine-tuning for most language pairs with the encoder, and most of the decoder, frozen.
arXiv Detail & Related papers (2020-04-30T16:09:22Z) - Improving Massively Multilingual Neural Machine Translation and
Zero-Shot Translation [81.7786241489002]
Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual models and deliver poor zero-shot translations.
We argue that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics.
We propose random online backtranslation to enforce the translation of unseen training language pairs.
arXiv Detail & Related papers (2020-04-24T17:21:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.