Speech-to-Speech Translation For A Real-world Unwritten Language
- URL: http://arxiv.org/abs/2211.06474v1
- Date: Fri, 11 Nov 2022 20:21:38 GMT
- Title: Speech-to-Speech Translation For A Real-world Unwritten Language
- Authors: Peng-Jen Chen, Kevin Tran, Yilin Yang, Jingfei Du, Justine Kao, Yu-An
Chung, Paden Tomasello, Paul-Ambroise Duquenne, Holger Schwenk, Hongyu Gong,
Hirofumi Inaguma, Sravya Popuri, Changhan Wang, Juan Pino, Wei-Ning Hsu, Ann
Lee
- Abstract summary: We study speech-to-speech translation (S2ST) that translates speech from one language into another language.
We present an end-to-end solution from training data collection, modeling choices to benchmark dataset release.
- Score: 62.414304258701804
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study speech-to-speech translation (S2ST) that translates speech from one
language into another language and focuses on building systems to support
languages without standard text writing systems. We use English-Taiwanese
Hokkien as a case study, and present an end-to-end solution from training data
collection, modeling choices to benchmark dataset release. First, we present
efforts on creating human annotated data, automatically mining data from large
unlabeled speech datasets, and adopting pseudo-labeling to produce weakly
supervised data. On the modeling, we take advantage of recent advances in
applying self-supervised discrete representations as target for prediction in
S2ST and show the effectiveness of leveraging additional text supervision from
Mandarin, a language similar to Hokkien, in model training. Finally, we release
an S2ST benchmark set to facilitate future research in this field. The demo can
be found at https://huggingface.co/spaces/facebook/Hokkien_Translation .
Related papers
- Textually Pretrained Speech Language Models [107.10344535390956]
We propose TWIST, a method for training SpeechLMs using a warm-start from a pretrained textual language models.
We show using both automatic and human evaluations that TWIST outperforms a cold-start SpeechLM across the board.
arXiv Detail & Related papers (2023-05-22T13:12:16Z) - The Interpreter Understands Your Meaning: End-to-end Spoken Language
Understanding Aided by Speech Translation [13.352795145385645]
Speech translation (ST) is a good means of pretraining speech models for end-to-end spoken language understanding.
We show that our models reach higher performance over baselines on monolingual and multilingual intent classification.
We also create new benchmark datasets for speech summarization and low-resource/zero-shot transfer from English to French or Spanish.
arXiv Detail & Related papers (2023-05-16T17:53:03Z) - Learning Cross-lingual Visual Speech Representations [108.68531445641769]
Cross-lingual self-supervised visual representation learning has been a growing research topic in the last few years.
We use the recently-proposed Raw Audio-Visual Speechs (RAVEn) framework to pre-train an audio-visual model with unlabelled data.
Our experiments show that: (1) multi-lingual models with more data outperform monolingual ones, but, when keeping the amount of data fixed, monolingual models tend to reach better performance.
arXiv Detail & Related papers (2023-03-14T17:05:08Z) - Textless Direct Speech-to-Speech Translation with Discrete Speech
Representation [27.182170555234226]
We propose a novel model, Textless Translatotron, for training an end-to-end direct S2ST model without any textual supervision.
When a speech encoder pre-trained with unsupervised speech data is used for both models, the proposed model obtains translation quality nearly on-par with Translatotron 2.
arXiv Detail & Related papers (2022-10-31T19:48:38Z) - Simple and Effective Unsupervised Speech Translation [68.25022245914363]
We study a simple and effective approach to build speech translation systems without labeled data.
We present an unsupervised domain adaptation technique for pre-trained speech models.
Experiments show that unsupervised speech-to-text translation outperforms the previous unsupervised state of the art.
arXiv Detail & Related papers (2022-10-18T22:26:13Z) - Enhanced Direct Speech-to-Speech Translation Using Self-supervised
Pre-training and Data Augmentation [76.13334392868208]
Direct speech-to-speech translation (S2ST) models suffer from data scarcity issues.
In this work, we explore self-supervised pre-training with unlabeled speech data and data augmentation to tackle this issue.
arXiv Detail & Related papers (2022-04-06T17:59:22Z) - Textless Speech-to-Speech Translation on Real Data [49.134208897722246]
We present a textless speech-to-speech translation (S2ST) system that can translate speech from one language into another language.
We tackle the challenge in modeling multi-speaker target speech and train the systems with real-world S2ST data.
arXiv Detail & Related papers (2021-12-15T18:56:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.