Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech
Translation
- URL: http://arxiv.org/abs/2205.08993v1
- Date: Wed, 18 May 2022 15:24:02 GMT
- Title: Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech
Translation
- Authors: Qianqian Dong, Fengpeng Yue, Tom Ko, Mingxuan Wang, Qibing Bai, Yu
Zhang
- Abstract summary: We build a S2ST Transformer baseline which outperforms the original Translatotron.
We utilize the external data by pseudo-labeling and obtain a new state-of-the-art result on the Fisher English-to-Spanish test set.
- Score: 29.103046944157484
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Direct Speech-to-speech translation (S2ST) has drawn more and more attention
recently. The task is very challenging due to data scarcity and complex
speech-to-speech mapping. In this paper, we report our recent achievements in
S2ST. Firstly, we build a S2ST Transformer baseline which outperforms the
original Translatotron. Secondly, we utilize the external data by
pseudo-labeling and obtain a new state-of-the-art result on the Fisher
English-to-Spanish test set. Indeed, we exploit the pseudo data with a
combination of popular techniques which are not trivial when applied to S2ST.
Moreover, we evaluate our approach on both syntactically similar
(Spanish-English) and distant (English-Chinese) language pairs. Our
implementation is available at
https://github.com/fengpeng-yue/speech-to-speech-translation.
Related papers
- Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data? [49.42189569058647]
Two-pass direct speech-to-speech translation (S2ST) models decompose the task into speech-to-text translation (S2TT) and text-to-speech (TTS)
In this paper, we introduce a composite S2ST model named ComSpeech, which can seamlessly integrate any pretrained S2TT and TTS models into a direct S2ST model.
We also propose a novel training method ComSpeech-ZS that solely utilizes S2TT and TTS data.
arXiv Detail & Related papers (2024-06-11T14:17:12Z) - Enhancing Speech-to-Speech Translation with Multiple TTS Targets [62.18395387305803]
We analyze the effect of changing synthesized target speech for direct S2ST models.
We propose a multi-task framework that jointly optimized the S2ST system with multiple targets from different TTS systems.
arXiv Detail & Related papers (2023-04-10T14:33:33Z) - Speech-to-Speech Translation For A Real-world Unwritten Language [62.414304258701804]
We study speech-to-speech translation (S2ST) that translates speech from one language into another language.
We present an end-to-end solution from training data collection, modeling choices to benchmark dataset release.
arXiv Detail & Related papers (2022-11-11T20:21:38Z) - Joint Pre-Training with Speech and Bilingual Text for Direct Speech to
Speech Translation [94.80029087828888]
Direct speech-to-speech translation (S2ST) is an attractive research topic with many advantages compared to cascaded S2ST.
Direct S2ST suffers from the data scarcity problem because the corpora from speech of the source language to speech of the target language are very rare.
We propose in this paper a Speech2S model, which is jointly pre-trained with unpaired speech and bilingual text data for direct speech-to-speech translation tasks.
arXiv Detail & Related papers (2022-10-31T02:55:51Z) - Improving Speech-to-Speech Translation Through Unlabeled Text [39.28273721043411]
Direct speech-to-speech translation (S2ST) is among the most challenging problems in the translation paradigm.
We propose an effective way to utilize the massive existing unlabeled text from different languages to create a large amount of S2ST data.
arXiv Detail & Related papers (2022-10-26T06:52:19Z) - Enhanced Direct Speech-to-Speech Translation Using Self-supervised
Pre-training and Data Augmentation [76.13334392868208]
Direct speech-to-speech translation (S2ST) models suffer from data scarcity issues.
In this work, we explore self-supervised pre-training with unlabeled speech data and data augmentation to tackle this issue.
arXiv Detail & Related papers (2022-04-06T17:59:22Z) - Textless Speech-to-Speech Translation on Real Data [49.134208897722246]
We present a textless speech-to-speech translation (S2ST) system that can translate speech from one language into another language.
We tackle the challenge in modeling multi-speaker target speech and train the systems with real-world S2ST data.
arXiv Detail & Related papers (2021-12-15T18:56:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.