Simple and Effective Unsupervised Speech Translation
- URL: http://arxiv.org/abs/2210.10191v1
- Date: Tue, 18 Oct 2022 22:26:13 GMT
- Title: Simple and Effective Unsupervised Speech Translation
- Authors: Changhan Wang, Hirofumi Inaguma, Peng-Jen Chen, Ilia Kulikov, Yun
Tang, Wei-Ning Hsu, Michael Auli, Juan Pino
- Abstract summary: We study a simple and effective approach to build speech translation systems without labeled data.
We present an unsupervised domain adaptation technique for pre-trained speech models.
Experiments show that unsupervised speech-to-text translation outperforms the previous unsupervised state of the art.
- Score: 68.25022245914363
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The amount of labeled data to train models for speech tasks is limited for
most languages, however, the data scarcity is exacerbated for speech
translation which requires labeled data covering two different languages. To
address this issue, we study a simple and effective approach to build speech
translation systems without labeled data by leveraging recent advances in
unsupervised speech recognition, machine translation and speech synthesis,
either in a pipeline approach, or to generate pseudo-labels for training
end-to-end speech translation models. Furthermore, we present an unsupervised
domain adaptation technique for pre-trained speech models which improves the
performance of downstream unsupervised speech recognition, especially for
low-resource settings. Experiments show that unsupervised speech-to-text
translation outperforms the previous unsupervised state of the art by 3.2 BLEU
on the Libri-Trans benchmark, on CoVoST 2, our best systems outperform the best
supervised end-to-end models (without pre-training) from only two years ago by
an average of 5.0 BLEU over five X-En directions. We also report competitive
results on MuST-C and CVSS benchmarks.
Related papers
- Improving Cascaded Unsupervised Speech Translation with Denoising
Back-translation [70.33052952571884]
We propose to build a cascaded speech translation system without leveraging any kind of paired data.
We use fully unpaired data to train our unsupervised systems and evaluate our results on CoVoST 2 and CVSS.
arXiv Detail & Related papers (2023-05-12T13:07:51Z) - Speech-to-Speech Translation For A Real-world Unwritten Language [62.414304258701804]
We study speech-to-speech translation (S2ST) that translates speech from one language into another language.
We present an end-to-end solution from training data collection, modeling choices to benchmark dataset release.
arXiv Detail & Related papers (2022-11-11T20:21:38Z) - Textless Direct Speech-to-Speech Translation with Discrete Speech
Representation [27.182170555234226]
We propose a novel model, Textless Translatotron, for training an end-to-end direct S2ST model without any textual supervision.
When a speech encoder pre-trained with unsupervised speech data is used for both models, the proposed model obtains translation quality nearly on-par with Translatotron 2.
arXiv Detail & Related papers (2022-10-31T19:48:38Z) - Joint Pre-Training with Speech and Bilingual Text for Direct Speech to
Speech Translation [94.80029087828888]
Direct speech-to-speech translation (S2ST) is an attractive research topic with many advantages compared to cascaded S2ST.
Direct S2ST suffers from the data scarcity problem because the corpora from speech of the source language to speech of the target language are very rare.
We propose in this paper a Speech2S model, which is jointly pre-trained with unpaired speech and bilingual text data for direct speech-to-speech translation tasks.
arXiv Detail & Related papers (2022-10-31T02:55:51Z) - Revisiting End-to-End Speech-to-Text Translation From Scratch [48.203394370942505]
End-to-end (E2E) speech-to-text translation (ST) often depends on pretraining its encoder and/or decoder using source transcripts via speech recognition or text translation tasks.
In this paper, we explore the extent to which the quality of E2E ST trained on speech-translation pairs alone can be improved.
arXiv Detail & Related papers (2022-06-09T15:39:19Z) - Enhanced Direct Speech-to-Speech Translation Using Self-supervised
Pre-training and Data Augmentation [76.13334392868208]
Direct speech-to-speech translation (S2ST) models suffer from data scarcity issues.
In this work, we explore self-supervised pre-training with unlabeled speech data and data augmentation to tackle this issue.
arXiv Detail & Related papers (2022-04-06T17:59:22Z) - SLAM: A Unified Encoder for Speech and Language Modeling via Speech-Text
Joint Pre-Training [33.02912456062474]
We build a single encoder with the BERT objective on unlabeled text together with the w2v-BERT objective on unlabeled speech.
We demonstrate that incorporating both speech and text data during pre-training can significantly improve downstream quality on CoVoST2 speech translation.
arXiv Detail & Related papers (2021-10-20T00:59:36Z) - Multilingual Denoising Pre-training for Neural Machine Translation [132.66750663226287]
mBART is a sequence-to-sequence denoising auto-encoder pre-trained on large-scale monolingual corpora.
mBART is one of the first methods for pre-training a complete sequence-to-sequence model.
arXiv Detail & Related papers (2020-01-22T18:59:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.