Improving Speech-to-Speech Translation Through Unlabeled Text
- URL: http://arxiv.org/abs/2210.14514v1
- Date: Wed, 26 Oct 2022 06:52:19 GMT
- Title: Improving Speech-to-Speech Translation Through Unlabeled Text
- Authors: Xuan-Phi Nguyen, Sravya Popuri, Changhan Wang, Yun Tang, Ilia Kulikov
and Hongyu Gong
- Abstract summary: Direct speech-to-speech translation (S2ST) is among the most challenging problems in the translation paradigm.
We propose an effective way to utilize the massive existing unlabeled text from different languages to create a large amount of S2ST data.
- Score: 39.28273721043411
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Direct speech-to-speech translation (S2ST) is among the most challenging
problems in the translation paradigm due to the significant scarcity of S2ST
data. While effort has been made to increase the data size from unlabeled
speech by cascading pretrained speech recognition (ASR), machine translation
(MT) and text-to-speech (TTS) models; unlabeled text has remained relatively
under-utilized to improve S2ST. We propose an effective way to utilize the
massive existing unlabeled text from different languages to create a large
amount of S2ST data to improve S2ST performance by applying various acoustic
effects to the generated synthetic data. Empirically our method outperforms the
state of the art in Spanish-English translation by up to 2 BLEU. Significant
gains by the proposed method are demonstrated in extremely low-resource
settings for both Spanish-English and Russian-English translations.
Related papers
- Enhancing Speech-to-Speech Translation with Multiple TTS Targets [62.18395387305803]
We analyze the effect of changing synthesized target speech for direct S2ST models.
We propose a multi-task framework that jointly optimized the S2ST system with multiple targets from different TTS systems.
arXiv Detail & Related papers (2023-04-10T14:33:33Z) - Joint Pre-Training with Speech and Bilingual Text for Direct Speech to
Speech Translation [94.80029087828888]
Direct speech-to-speech translation (S2ST) is an attractive research topic with many advantages compared to cascaded S2ST.
Direct S2ST suffers from the data scarcity problem because the corpora from speech of the source language to speech of the target language are very rare.
We propose in this paper a Speech2S model, which is jointly pre-trained with unpaired speech and bilingual text data for direct speech-to-speech translation tasks.
arXiv Detail & Related papers (2022-10-31T02:55:51Z) - Simple and Effective Unsupervised Speech Translation [68.25022245914363]
We study a simple and effective approach to build speech translation systems without labeled data.
We present an unsupervised domain adaptation technique for pre-trained speech models.
Experiments show that unsupervised speech-to-text translation outperforms the previous unsupervised state of the art.
arXiv Detail & Related papers (2022-10-18T22:26:13Z) - Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech
Translation [29.103046944157484]
We build a S2ST Transformer baseline which outperforms the original Translatotron.
We utilize the external data by pseudo-labeling and obtain a new state-of-the-art result on the Fisher English-to-Spanish test set.
arXiv Detail & Related papers (2022-05-18T15:24:02Z) - Enhanced Direct Speech-to-Speech Translation Using Self-supervised
Pre-training and Data Augmentation [76.13334392868208]
Direct speech-to-speech translation (S2ST) models suffer from data scarcity issues.
In this work, we explore self-supervised pre-training with unlabeled speech data and data augmentation to tackle this issue.
arXiv Detail & Related papers (2022-04-06T17:59:22Z) - Leveraging unsupervised and weakly-supervised data to improve direct
speech-to-speech translation [32.24706553793383]
Speech-to-speech translation (S2ST) without relying on intermediate text representations is a rapidly emerging frontier of research.
Recent works have demonstrated that the performance of such direct S2ST systems is approaching that of conventional cascade S2ST when trained on comparable datasets.
In this work, we explore multiple approaches for leveraging much more widely available unsupervised and weakly-supervised speech and text data to improve the performance of direct S2ST based on Translatotron 2.
arXiv Detail & Related papers (2022-03-24T21:06:15Z) - Textless Speech-to-Speech Translation on Real Data [49.134208897722246]
We present a textless speech-to-speech translation (S2ST) system that can translate speech from one language into another language.
We tackle the challenge in modeling multi-speaker target speech and train the systems with real-world S2ST data.
arXiv Detail & Related papers (2021-12-15T18:56:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.