Joint Pre-Training with Speech and Bilingual Text for Direct Speech to
Speech Translation
- URL: http://arxiv.org/abs/2210.17027v1
- Date: Mon, 31 Oct 2022 02:55:51 GMT
- Title: Joint Pre-Training with Speech and Bilingual Text for Direct Speech to
Speech Translation
- Authors: Kun Wei, Long Zhou, Ziqiang Zhang, Liping Chen, Shujie Liu, Lei He,
Jinyu Li, Furu Wei
- Abstract summary: Direct speech-to-speech translation (S2ST) is an attractive research topic with many advantages compared to cascaded S2ST.
Direct S2ST suffers from the data scarcity problem because the corpora from speech of the source language to speech of the target language are very rare.
We propose in this paper a Speech2S model, which is jointly pre-trained with unpaired speech and bilingual text data for direct speech-to-speech translation tasks.
- Score: 94.80029087828888
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Direct speech-to-speech translation (S2ST) is an attractive research topic
with many advantages compared to cascaded S2ST. However, direct S2ST suffers
from the data scarcity problem because the corpora from speech of the source
language to speech of the target language are very rare. To address this issue,
we propose in this paper a Speech2S model, which is jointly pre-trained with
unpaired speech and bilingual text data for direct speech-to-speech translation
tasks. By effectively leveraging the paired text data, Speech2S is capable of
modeling the cross-lingual speech conversion from source to target language. We
verify the performance of the proposed Speech2S on Europarl-ST and VoxPopuli
datasets. Experimental results demonstrate that Speech2S gets an improvement of
about 5 BLEU scores compared to encoder-only pre-training models, and achieves
a competitive or even better performance than existing state-of-the-art
models1.
Related papers
- Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data? [49.42189569058647]
Two-pass direct speech-to-speech translation (S2ST) models decompose the task into speech-to-text translation (S2TT) and text-to-speech (TTS)
In this paper, we introduce a composite S2ST model named ComSpeech, which can seamlessly integrate any pretrained S2TT and TTS models into a direct S2ST model.
We also propose a novel training method ComSpeech-ZS that solely utilizes S2TT and TTS data.
arXiv Detail & Related papers (2024-06-11T14:17:12Z) - TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation [97.54885207518946]
We introduce a novel model framework TransVIP that leverages diverse datasets in a cascade fashion.
We propose two separated encoders to preserve the speaker's voice characteristics and isochrony from the source speech during the translation process.
Our experiments on the French-English language pair demonstrate that our model outperforms the current state-of-the-art speech-to-speech translation model.
arXiv Detail & Related papers (2024-05-28T04:11:37Z) - Enhancing Speech-to-Speech Translation with Multiple TTS Targets [62.18395387305803]
We analyze the effect of changing synthesized target speech for direct S2ST models.
We propose a multi-task framework that jointly optimized the S2ST system with multiple targets from different TTS systems.
arXiv Detail & Related papers (2023-04-10T14:33:33Z) - Enhanced Direct Speech-to-Speech Translation Using Self-supervised
Pre-training and Data Augmentation [76.13334392868208]
Direct speech-to-speech translation (S2ST) models suffer from data scarcity issues.
In this work, we explore self-supervised pre-training with unlabeled speech data and data augmentation to tackle this issue.
arXiv Detail & Related papers (2022-04-06T17:59:22Z) - Textless Speech-to-Speech Translation on Real Data [49.134208897722246]
We present a textless speech-to-speech translation (S2ST) system that can translate speech from one language into another language.
We tackle the challenge in modeling multi-speaker target speech and train the systems with real-world S2ST data.
arXiv Detail & Related papers (2021-12-15T18:56:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.