Selective Data Augmentation for Robust Speech Translation
- URL: http://arxiv.org/abs/2304.03169v2
- Date: Tue, 25 Apr 2023 11:05:55 GMT
- Title: Selective Data Augmentation for Robust Speech Translation
- Authors: Rajul Acharya, Ashish Panda, Sunil Kumar Kopparapu
- Abstract summary: We propose an e2e architecture for English-Hindi (en-hi) ST.
We use two imperfect machine translation (MT) services to translate Libri-trans en text into hi text.
We show that this results in better ST (BLEU) score compared to brute force augmentation of MT data.
- Score: 17.56859840101276
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Speech translation (ST) systems translate speech in one language to text in
another language. End-to-end ST systems (e2e-ST) have gained popularity over
cascade systems because of their enhanced performance due to reduced latency
and computational cost. Though resource intensive, e2e-ST systems have the
inherent ability to retain para and non-linguistic characteristics of the
speech unlike cascade systems. In this paper, we propose to use an e2e
architecture for English-Hindi (en-hi) ST. We use two imperfect machine
translation (MT) services to translate Libri-trans en text into hi text. While
each service gives MT data individually to generate parallel ST data, we
propose a data augmentation strategy of noisy MT data to aid robust ST. The
main contribution of this paper is the proposal of a data augmentation
strategy. We show that this results in better ST (BLEU score) compared to brute
force augmentation of MT data. We observed an absolute improvement of 1.59 BLEU
score with our approach.
Related papers
- Translation-Enhanced Multilingual Text-to-Image Generation [61.41730893884428]
Research on text-to-image generation (TTI) still predominantly focuses on the English language.
In this work, we thus investigate multilingual TTI and the current potential of neural machine translation (NMT) to bootstrap mTTI systems.
We propose Ensemble Adapter (EnsAd), a novel parameter-efficient approach that learns to weigh and consolidate the multilingual text knowledge within the mTTI framework.
arXiv Detail & Related papers (2023-05-30T17:03:52Z) - DUB: Discrete Unit Back-translation for Speech Translation [32.74997208667928]
We propose Discrete Unit Back-translation (DUB) to answer two questions: Is it better to represent speech with discrete units than with continuous features in direct ST?
With DUB, the back-translation technique can successfully be applied on direct ST and obtains an average boost of 5.5 BLEU on MuST-C En-De/Fr/Es.
In the low-resource language scenario, our method achieves comparable performance to existing methods that rely on large-scale external data.
arXiv Detail & Related papers (2023-05-19T03:48:16Z) - Back Translation for Speech-to-text Translation Without Transcripts [11.13240570688547]
We develop a back translation algorithm for ST (BT4ST) to synthesize pseudo ST data from monolingual target data.
To ease the challenges posed by short-to-long generation and one-to-many mapping, we introduce self-supervised discrete units.
With our synthetic ST data, we achieve an average boost of 2.3 BLEU on MuST-C En-De, En-Fr, and En-Es datasets.
arXiv Detail & Related papers (2023-05-15T15:12:40Z) - Improving Cascaded Unsupervised Speech Translation with Denoising
Back-translation [70.33052952571884]
We propose to build a cascaded speech translation system without leveraging any kind of paired data.
We use fully unpaired data to train our unsupervised systems and evaluate our results on CoVoST 2 and CVSS.
arXiv Detail & Related papers (2023-05-12T13:07:51Z) - Enhancing Speech-to-Speech Translation with Multiple TTS Targets [62.18395387305803]
We analyze the effect of changing synthesized target speech for direct S2ST models.
We propose a multi-task framework that jointly optimized the S2ST system with multiple targets from different TTS systems.
arXiv Detail & Related papers (2023-04-10T14:33:33Z) - Speech-to-Speech Translation For A Real-world Unwritten Language [62.414304258701804]
We study speech-to-speech translation (S2ST) that translates speech from one language into another language.
We present an end-to-end solution from training data collection, modeling choices to benchmark dataset release.
arXiv Detail & Related papers (2022-11-11T20:21:38Z) - Improving Speech-to-Speech Translation Through Unlabeled Text [39.28273721043411]
Direct speech-to-speech translation (S2ST) is among the most challenging problems in the translation paradigm.
We propose an effective way to utilize the massive existing unlabeled text from different languages to create a large amount of S2ST data.
arXiv Detail & Related papers (2022-10-26T06:52:19Z) - Generating Synthetic Speech from SpokenVocab for Speech Translation [18.525896864903416]
Training end-to-end speech translation systems requires sufficiently large-scale data.
One practical solution is to convert machine translation data (MT) to ST data via text-to-speech (TTS) systems.
We propose a simple, scalable and effective data augmentation technique, i.e., SpokenVocab, to convert MT data to ST data on-the-fly.
arXiv Detail & Related papers (2022-10-15T03:07:44Z) - Enhanced Direct Speech-to-Speech Translation Using Self-supervised
Pre-training and Data Augmentation [76.13334392868208]
Direct speech-to-speech translation (S2ST) models suffer from data scarcity issues.
In this work, we explore self-supervised pre-training with unlabeled speech data and data augmentation to tackle this issue.
arXiv Detail & Related papers (2022-04-06T17:59:22Z) - Tackling data scarcity in speech translation using zero-shot
multilingual machine translation techniques [12.968557512440759]
Several techniques have been proposed for zero-shot translation.
We investigate whether these ideas can be applied to speech translation, by building ST models trained on speech transcription and text translation data.
The techniques were successfully applied to few-shot ST using limited ST data, with improvements of up to +12.9 BLEU points compared to direct end-to-end ST and +3.1 BLEU points compared to ST models fine-tuned from ASR model.
arXiv Detail & Related papers (2022-01-26T20:20:59Z) - Textless Speech-to-Speech Translation on Real Data [49.134208897722246]
We present a textless speech-to-speech translation (S2ST) system that can translate speech from one language into another language.
We tackle the challenge in modeling multi-speaker target speech and train the systems with real-world S2ST data.
arXiv Detail & Related papers (2021-12-15T18:56:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.