Worse WER, but Better BLEU? Leveraging Word Embedding as Intermediate in
Multitask End-to-End Speech Translation
- URL: http://arxiv.org/abs/2005.10678v2
- Date: Sat, 4 Jul 2020 01:04:58 GMT
- Title: Worse WER, but Better BLEU? Leveraging Word Embedding as Intermediate in
Multitask End-to-End Speech Translation
- Authors: Shun-Po Chuang, Tzu-Wei Sung, Alexander H. Liu, Hung-yi Lee
- Abstract summary: Speech translation (ST) aims to learn transformations from speech in the source language to the text in the target language.
We propose to improve the multitask ST model by utilizing word embedding as the intermediate.
- Score: 127.54315184545796
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Speech translation (ST) aims to learn transformations from speech in the
source language to the text in the target language. Previous works show that
multitask learning improves the ST performance, in which the recognition
decoder generates the text of the source language, and the translation decoder
obtains the final translations based on the output of the recognition decoder.
Because whether the output of the recognition decoder has the correct semantics
is more critical than its accuracy, we propose to improve the multitask ST
model by utilizing word embedding as the intermediate.
Related papers
- Improving Multilingual Neural Machine Translation by Utilizing Semantic and Linguistic Features [18.76505158652759]
We propose to exploit both semantic and linguistic features between multiple languages to enhance multilingual translation.
On the encoder side, we introduce a disentangling learning task that aligns encoder representations by disentangling semantic and linguistic features.
On the decoder side, we leverage a linguistic encoder to integrate low-level linguistic features to assist in the target language generation.
arXiv Detail & Related papers (2024-08-02T17:10:12Z) - Improving End-to-end Speech Translation by Leveraging Auxiliary Speech
and Text Data [38.816953592085156]
We present a method for introducing a text encoder into pre-trained end-to-end speech translation systems.
It enhances the ability of adapting one modality (i.e., source-language speech) to another (i.e., source-language text)
arXiv Detail & Related papers (2022-12-04T09:27:56Z) - Code-Switching without Switching: Language Agnostic End-to-End Speech
Translation [68.8204255655161]
We treat speech recognition and translation as one unified end-to-end speech translation problem.
By training LAST with both input languages, we decode speech into one target language, regardless of the input language.
arXiv Detail & Related papers (2022-10-04T10:34:25Z) - Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo
Languages [58.43299730989809]
We introduce Wav2Seq, the first self-supervised approach to pre-train both parts of encoder-decoder models for speech data.
We induce a pseudo language as a compact discrete representation, and formulate a self-supervised pseudo speech recognition task.
This process stands on its own, or can be applied as low-cost second-stage pre-training.
arXiv Detail & Related papers (2022-05-02T17:59:02Z) - Multilingual Speech Translation with Unified Transformer: Huawei Noah's
Ark Lab at IWSLT 2021 [33.876412404781846]
This paper describes the system submitted to the IWSLT 2021 Speech Translation (MultiST) task from Huawei Noah's Ark Lab.
We use a unified transformer architecture for our MultiST model, so that the data from different modalities can be exploited to enhance the model's ability.
We apply several training techniques to improve the performance, including multi-task learning, task-level curriculum learning, data augmentation, etc.
arXiv Detail & Related papers (2021-06-01T02:50:49Z) - AlloST: Low-resource Speech Translation without Source Transcription [17.53382405899421]
We propose a learning framework that utilizes a language-independent universal phone recognizer.
The framework is based on an attention-based sequence-to-sequence model.
Experiments conducted on the Fisher Spanish-English and Taigi-Mandarin drama corpora show that our method outperforms the conformer-based baseline.
arXiv Detail & Related papers (2021-05-01T05:30:18Z) - Bridging the Modality Gap for Speech-to-Text Translation [57.47099674461832]
End-to-end speech translation aims to translate speech in one language into text in another language via an end-to-end way.
Most existing methods employ an encoder-decoder structure with a single encoder to learn acoustic representation and semantic information simultaneously.
We propose a Speech-to-Text Adaptation for Speech Translation model which aims to improve the end-to-end model performance by bridging the modality gap between speech and text.
arXiv Detail & Related papers (2020-10-28T12:33:04Z) - Curriculum Pre-training for End-to-End Speech Translation [51.53031035374276]
We propose a curriculum pre-training method that includes an elementary course for transcription learning and two advanced courses for understanding the utterance and mapping words in two languages.
Experiments show that our curriculum pre-training method leads to significant improvements on En-De and En-Fr speech translation benchmarks.
arXiv Detail & Related papers (2020-04-21T15:12:07Z) - Bi-Decoder Augmented Network for Neural Machine Translation [108.3931242633331]
We propose a novel Bi-Decoder Augmented Network (BiDAN) for the neural machine translation task.
Since each decoder transforms the representations of the input text into its corresponding language, jointly training with two target ends can make the shared encoder has the potential to produce a language-independent semantic space.
arXiv Detail & Related papers (2020-01-14T02:05:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.