Self-Supervised Representations Improve End-to-End Speech Translation
- URL: http://arxiv.org/abs/2006.12124v2
- Date: Sun, 25 Oct 2020 03:31:15 GMT
- Title: Self-Supervised Representations Improve End-to-End Speech Translation
- Authors: Anne Wu, Changhan Wang, Juan Pino, Jiatao Gu
- Abstract summary: We show that self-supervised pre-trained features can consistently improve the translation performance.
Cross-lingual transfer allows to extend to a variety of languages without or with little tuning.
- Score: 57.641761472372814
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: End-to-end speech-to-text translation can provide a simpler and smaller
system but is facing the challenge of data scarcity. Pre-training methods can
leverage unlabeled data and have been shown to be effective on data-scarce
settings. In this work, we explore whether self-supervised pre-trained speech
representations can benefit the speech translation task in both high- and
low-resource settings, whether they can transfer well to other languages, and
whether they can be effectively combined with other common methods that help
improve low-resource end-to-end speech translation such as using a pre-trained
high-resource speech recognition system. We demonstrate that self-supervised
pre-trained features can consistently improve the translation performance, and
cross-lingual transfer allows to extend to a variety of languages without or
with little tuning.
Related papers
- Cross-Lingual Transfer Learning for Speech Translation [7.802021866251242]
Zero-shot cross-lingual transfer has been demonstrated on a range of NLP tasks.
We explore whether speech-based models exhibit the same transfer capability.
arXiv Detail & Related papers (2024-07-01T09:51:48Z) - Towards a Deep Understanding of Multilingual End-to-End Speech
Translation [52.26739715012842]
We analyze representations learnt in a multilingual end-to-end speech translation model trained over 22 languages.
We derive three major findings from our analysis.
arXiv Detail & Related papers (2023-10-31T13:50:55Z) - Enhancing expressivity transfer in textless speech-to-speech translation [0.0]
Existing state-of-the-art systems fall short when it comes to capturing and transferring expressivity accurately across different languages.
This study presents a novel method that operates at the discrete speech unit level and leverages multilingual emotion embeddings.
We demonstrate how these embeddings can be used to effectively predict the pitch and duration of speech units in the target language.
arXiv Detail & Related papers (2023-10-11T08:07:22Z) - Direct Speech-to-speech Translation without Textual Annotation using
Bottleneck Features [13.44542301438426]
We propose a direct speech-to-speech translation model which can be trained without any textual annotation or content information.
Experiments on Mandarin-Cantonese speech translation demonstrate the feasibility of the proposed approach.
arXiv Detail & Related papers (2022-12-12T10:03:10Z) - Improving End-to-end Speech Translation by Leveraging Auxiliary Speech
and Text Data [38.816953592085156]
We present a method for introducing a text encoder into pre-trained end-to-end speech translation systems.
It enhances the ability of adapting one modality (i.e., source-language speech) to another (i.e., source-language text)
arXiv Detail & Related papers (2022-12-04T09:27:56Z) - Simple and Effective Unsupervised Speech Translation [68.25022245914363]
We study a simple and effective approach to build speech translation systems without labeled data.
We present an unsupervised domain adaptation technique for pre-trained speech models.
Experiments show that unsupervised speech-to-text translation outperforms the previous unsupervised state of the art.
arXiv Detail & Related papers (2022-10-18T22:26:13Z) - Cross-lingual Transfer for Speech Processing using Acoustic Language
Similarity [81.51206991542242]
Cross-lingual transfer offers a compelling way to help bridge this digital divide.
Current cross-lingual algorithms have shown success in text-based tasks and speech-related tasks over some low-resource languages.
We propose a language similarity approach that can efficiently identify acoustic cross-lingual transfer pairs across hundreds of languages.
arXiv Detail & Related papers (2021-11-02T01:55:17Z) - Textual Supervision for Visually Grounded Spoken Language Understanding [51.93744335044475]
Visually-grounded models of spoken language understanding extract semantic information directly from speech.
This is useful for low-resource languages, where transcriptions can be expensive or impossible to obtain.
Recent work showed that these models can be improved if transcriptions are available at training time.
arXiv Detail & Related papers (2020-10-06T15:16:23Z) - Meta-Transfer Learning for Code-Switched Speech Recognition [72.84247387728999]
We propose a new learning method, meta-transfer learning, to transfer learn on a code-switched speech recognition system in a low-resource setting.
Our model learns to recognize individual languages, and transfer them so as to better recognize mixed-language speech by conditioning the optimization on the code-switching data.
arXiv Detail & Related papers (2020-04-29T14:27:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.