Improved Cross-Lingual Transfer Learning For Automatic Speech
Translation
- URL: http://arxiv.org/abs/2306.00789v4
- Date: Thu, 25 Jan 2024 07:45:45 GMT
- Title: Improved Cross-Lingual Transfer Learning For Automatic Speech
Translation
- Authors: Sameer Khurana, Nauman Dawalatabad, Antoine Laurent, Luis Vicente,
Pablo Gimeno, Victoria Mingote, James Glass
- Abstract summary: We show that by initializing the encoder of the encoder-decoder sequence-to-sequence translation model with SAMU-XLS-R, we achieve significantly better cross-lingual task knowledge transfer.
We demonstrate the effectiveness of our approach on two popular datasets, namely, CoVoST-2 and Europarl.
- Score: 18.97234151624098
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Research in multilingual speech-to-text translation is topical. Having a
single model that supports multiple translation tasks is desirable. The goal of
this work it to improve cross-lingual transfer learning in multilingual
speech-to-text translation via semantic knowledge distillation. We show that by
initializing the encoder of the encoder-decoder sequence-to-sequence
translation model with SAMU-XLS-R, a multilingual speech transformer encoder
trained using multi-modal (speech-text) semantic knowledge distillation, we
achieve significantly better cross-lingual task knowledge transfer than the
baseline XLS-R, a multilingual speech transformer encoder trained via
self-supervised learning. We demonstrate the effectiveness of our approach on
two popular datasets, namely, CoVoST-2 and Europarl. On the 21 translation
tasks of the CoVoST-2 benchmark, we achieve an average improvement of 12.8 BLEU
points over the baselines. In the zero-shot translation scenario, we achieve an
average gain of 18.8 and 11.9 average BLEU points on unseen medium and
low-resource languages. We make similar observations on Europarl speech
translation benchmark.
Related papers
- Towards a Deep Understanding of Multilingual End-to-End Speech
Translation [52.26739715012842]
We analyze representations learnt in a multilingual end-to-end speech translation model trained over 22 languages.
We derive three major findings from our analysis.
arXiv Detail & Related papers (2023-10-31T13:50:55Z) - Learning Multilingual Sentence Representations with Cross-lingual
Consistency Regularization [46.09132547431629]
We introduce MuSR: a one-for-all Multilingual Sentence Representation model that supports more than 220 languages.
We train a multilingual Transformer encoder, coupled with an auxiliary Transformer decoder, by adopting a multilingual NMT framework.
Experimental results on multilingual similarity search and bitext mining tasks show the effectiveness of our approach.
arXiv Detail & Related papers (2023-06-12T07:39:06Z) - The Interpreter Understands Your Meaning: End-to-end Spoken Language
Understanding Aided by Speech Translation [13.352795145385645]
Speech translation (ST) is a good means of pretraining speech models for end-to-end spoken language understanding.
We show that our models reach higher performance over baselines on monolingual and multilingual intent classification.
We also create new benchmark datasets for speech summarization and low-resource/zero-shot transfer from English to French or Spanish.
arXiv Detail & Related papers (2023-05-16T17:53:03Z) - Scaling Up Deliberation for Multilingual ASR [36.860327600638705]
We investigate second-pass deliberation for multilingual speech recognition.
Our proposed deliberation is multilingual, i.e., the text encoder encodes hypothesis text from multiple languages, and the decoder attends to multilingual text and audio.
We show that deliberation improves the average WER on 9 languages by 4% relative compared to the single-pass model.
arXiv Detail & Related papers (2022-10-11T21:07:00Z) - Multilingual Speech Translation with Unified Transformer: Huawei Noah's
Ark Lab at IWSLT 2021 [33.876412404781846]
This paper describes the system submitted to the IWSLT 2021 Speech Translation (MultiST) task from Huawei Noah's Ark Lab.
We use a unified transformer architecture for our MultiST model, so that the data from different modalities can be exploited to enhance the model's ability.
We apply several training techniques to improve the performance, including multi-task learning, task-level curriculum learning, data augmentation, etc.
arXiv Detail & Related papers (2021-06-01T02:50:49Z) - Multilingual Transfer Learning for QA Using Translation as Data
Augmentation [13.434957024596898]
We explore strategies that improve cross-lingual transfer by bringing the multilingual embeddings closer in the semantic space.
We propose two novel strategies, language adversarial training and language arbitration framework, which significantly improve the (zero-resource) cross-lingual transfer performance.
Empirically, we show that the proposed models outperform the previous zero-shot baseline on the recently introduced multilingual MLQA and TyDiQA datasets.
arXiv Detail & Related papers (2020-12-10T20:29:34Z) - VECO: Variable and Flexible Cross-lingual Pre-training for Language
Understanding and Generation [77.82373082024934]
We plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages.
It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language.
The proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark.
arXiv Detail & Related papers (2020-10-30T03:41:38Z) - Multilingual Speech Translation with Efficient Finetuning of Pretrained
Models [82.22294901727933]
A minimalistic LNA (LayerNorm and Attention) finetuning can achieve zero-shot crosslingual and cross-modality transfer ability.
Our approach demonstrates strong zero-shot performance in a many-to-many multilingual model.
arXiv Detail & Related papers (2020-10-24T08:15:08Z) - Explicit Alignment Objectives for Multilingual Bidirectional Encoders [111.65322283420805]
We present a new method for learning multilingual encoders, AMBER (Aligned Multilingual Bi-directional EncodeR)
AMBER is trained on additional parallel data using two explicit alignment objectives that align the multilingual representations at different granularities.
Experimental results show that AMBER obtains gains of up to 1.1 average F1 score on sequence tagging and up to 27.3 average accuracy on retrieval over the XLMR-large model.
arXiv Detail & Related papers (2020-10-15T18:34:13Z) - FILTER: An Enhanced Fusion Method for Cross-lingual Language
Understanding [85.29270319872597]
We propose an enhanced fusion method that takes cross-lingual data as input for XLM finetuning.
During inference, the model makes predictions based on the text input in the target language and its translation in the source language.
To tackle this issue, we propose an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language.
arXiv Detail & Related papers (2020-09-10T22:42:15Z) - Cross-lingual Retrieval for Iterative Self-Supervised Training [66.3329263451598]
Cross-lingual alignment can be further improved by training seq2seq models on sentence pairs mined using their own encoder outputs.
We develop a new approach -- cross-lingual retrieval for iterative self-supervised training.
arXiv Detail & Related papers (2020-06-16T21:30:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.