Direct Speech-to-Speech Neural Machine Translation: A Survey
- URL: http://arxiv.org/abs/2411.14453v1
- Date: Wed, 13 Nov 2024 13:01:21 GMT
- Title: Direct Speech-to-Speech Neural Machine Translation: A Survey
- Authors: Mahendra Gupta, Maitreyee Dutta, Chandresh Kumar Maurya,
- Abstract summary: Speech-to-Speech Translation (S2ST) models transform speech from one language to another target language with the same linguistic information.<n>In recent years, researchers have introduced direct S2ST models, which have the potential to translate speech without relying on intermediate text generation.<n>However, direct S2ST has yet to achieve quality performance for seamless communication and still lags behind the cascade models in terms of performance.
- Score: 0.8999666725996978
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Speech-to-Speech Translation (S2ST) models transform speech from one language to another target language with the same linguistic information. S2ST is important for bridging the communication gap among communities and has diverse applications. In recent years, researchers have introduced direct S2ST models, which have the potential to translate speech without relying on intermediate text generation, have better decoding latency, and the ability to preserve paralinguistic and non-linguistic features. However, direct S2ST has yet to achieve quality performance for seamless communication and still lags behind the cascade models in terms of performance, especially in real-world translation. To the best of our knowledge, no comprehensive survey is available on the direct S2ST system, which beginners and advanced researchers can look upon for a quick survey. The present work provides a comprehensive review of direct S2ST models, data and application issues, and performance metrics. We critically analyze the models' performance over the benchmark datasets and provide research challenges and future directions.
Related papers
- S2S-Arena, Evaluating Speech2Speech Protocols on Instruction Following with Paralinguistic Information [47.950757976473035]
We introduce S2S-Arena, a novel arena-style S2S benchmark that evaluates instruction-following capabilities with paralinguistic information.
In addition to the superior performance of GPT-4o, the speech model of cascaded ASR, LLM, and TTS outperforms the jointly trained model after text-speech alignment in speech2speech protocols.
arXiv Detail & Related papers (2025-03-07T02:07:00Z) - Direct Speech to Speech Translation: A Review [0.0]
Speech to speech translation (S2ST) is a transformative technology that bridges global communication gaps.
Traditional cascade models that rely on automatic speech recognition (ASR), machine translation (MT), and text to speech (TTS) components suffer from error propagation, increased latency, and loss of prosody.
Direct S2ST models retain speaker identity, reduce latency, and improve translation naturalness by preserving vocal characteristics and prosody.
arXiv Detail & Related papers (2025-03-03T06:48:22Z) - Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data? [49.42189569058647]
Two-pass direct speech-to-speech translation (S2ST) models decompose the task into speech-to-text translation (S2TT) and text-to-speech (TTS)
In this paper, we introduce a composite S2ST model named ComSpeech, which can seamlessly integrate any pretrained S2TT and TTS models into a direct S2ST model.
We also propose a novel training method ComSpeech-ZS that solely utilizes S2TT and TTS data.
arXiv Detail & Related papers (2024-06-11T14:17:12Z) - End-to-End Speech-to-Text Translation: A Survey [0.0]
Speech-to-text translation pertains to the task of converting speech signals in a language to text in another language.
Automatic Speech Recognition (ASR), as well as Machine Translation(MT) models, play crucial roles in traditional ST translation.
arXiv Detail & Related papers (2023-12-02T07:40:32Z) - Textless Speech-to-Speech Translation With Limited Parallel Data [51.3588490789084]
PFB is a framework for training textless S2ST models that require just dozens of hours of parallel speech data.
We train and evaluate our models for English-to-German, German-to-English and Marathi-to-English translation on three different domains.
arXiv Detail & Related papers (2023-05-24T17:59:05Z) - Enhancing Speech-to-Speech Translation with Multiple TTS Targets [62.18395387305803]
We analyze the effect of changing synthesized target speech for direct S2ST models.
We propose a multi-task framework that jointly optimized the S2ST system with multiple targets from different TTS systems.
arXiv Detail & Related papers (2023-04-10T14:33:33Z) - Speech-to-Speech Translation For A Real-world Unwritten Language [62.414304258701804]
We study speech-to-speech translation (S2ST) that translates speech from one language into another language.
We present an end-to-end solution from training data collection, modeling choices to benchmark dataset release.
arXiv Detail & Related papers (2022-11-11T20:21:38Z) - Joint Pre-Training with Speech and Bilingual Text for Direct Speech to
Speech Translation [94.80029087828888]
Direct speech-to-speech translation (S2ST) is an attractive research topic with many advantages compared to cascaded S2ST.
Direct S2ST suffers from the data scarcity problem because the corpora from speech of the source language to speech of the target language are very rare.
We propose in this paper a Speech2S model, which is jointly pre-trained with unpaired speech and bilingual text data for direct speech-to-speech translation tasks.
arXiv Detail & Related papers (2022-10-31T02:55:51Z) - Textless Speech-to-Speech Translation on Real Data [49.134208897722246]
We present a textless speech-to-speech translation (S2ST) system that can translate speech from one language into another language.
We tackle the challenge in modeling multi-speaker target speech and train the systems with real-world S2ST data.
arXiv Detail & Related papers (2021-12-15T18:56:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.