The IWSLT 2021 BUT Speech Translation Systems
- URL: http://arxiv.org/abs/2107.06155v1
- Date: Tue, 13 Jul 2021 15:11:18 GMT
- Title: The IWSLT 2021 BUT Speech Translation Systems
- Authors: Hari Krishna Vydana, Martin Karafi'at, Luk'as Burget, "Honza"
Cernock'y
- Abstract summary: BUT's English to German offline speech translation(ST) systems developed for IWSLT 2021.
They are based on jointly trained Automatic Speech Recognition-Machine Translation models.
Their performances is evaluated on MustC-Common test set.
- Score: 2.4373900721120285
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The paper describes BUT's English to German offline speech translation(ST)
systems developed for IWSLT2021. They are based on jointly trained Automatic
Speech Recognition-Machine Translation models. Their performances is evaluated
on MustC-Common test set. In this work, we study their efficiency from the
perspective of having a large amount of separate ASR training data and MT
training data, and a smaller amount of speech-translation training data. Large
amounts of ASR and MT training data are utilized for pre-training the ASR and
MT models. Speech-translation data is used to jointly optimize ASR-MT models by
defining an end-to-end differentiable path from speech to translations. For
this purpose, we use the internal continuous representations from the
ASR-decoder as the input to MT module. We show that speech translation can be
further improved by training the ASR-decoder jointly with the MT-module using
large amount of text-only MT training data. We also show significant
improvements by training an ASR module capable of generating punctuated text,
rather than leaving the punctuation task to the MT module.
Related papers
- Towards Zero-Shot Multimodal Machine Translation [64.9141931372384]
We propose a method to bypass the need for fully supervised data to train multimodal machine translation systems.
Our method, called ZeroMMT, consists in adapting a strong text-only machine translation (MT) model by training it on a mixture of two objectives.
To prove that our method generalizes to languages with no fully supervised training data available, we extend the CoMMuTE evaluation dataset to three new languages: Arabic, Russian and Chinese.
arXiv Detail & Related papers (2024-07-18T15:20:31Z) - Blending LLMs into Cascaded Speech Translation: KIT's Offline Speech Translation System for IWSLT 2024 [61.189875635090225]
Large Language Models (LLMs) are currently under exploration for various tasks, including Automatic Speech Recognition (ASR), Machine Translation (MT), and even End-to-End Speech Translation (ST)
arXiv Detail & Related papers (2024-06-24T16:38:17Z) - Enhanced Direct Speech-to-Speech Translation Using Self-supervised
Pre-training and Data Augmentation [76.13334392868208]
Direct speech-to-speech translation (S2ST) models suffer from data scarcity issues.
In this work, we explore self-supervised pre-training with unlabeled speech data and data augmentation to tackle this issue.
arXiv Detail & Related papers (2022-04-06T17:59:22Z) - The HW-TSC's Offline Speech Translation Systems for IWSLT 2021
Evaluation [22.617563646374602]
This paper describes our work in participation of the IWSLT-2021 offline speech translation task.
Our system was built in a cascade form, including a speaker diarization module, an Automatic Speech Recognition (ASR) module and a Machine Translation (MT) module.
Our method achieves 24.6 BLEU score on the 2021 test set.
arXiv Detail & Related papers (2021-08-09T07:28:04Z) - The USYD-JD Speech Translation System for IWSLT 2021 [85.64797317290349]
This paper describes the University of Sydney& JD's joint submission of the IWSLT 2021 low resource speech translation task.
We trained our models with the officially provided ASR and MT datasets.
To achieve better translation performance, we explored the most recent effective strategies, including back translation, knowledge distillation, multi-feature reranking and transductive finetuning.
arXiv Detail & Related papers (2021-07-24T09:53:34Z) - Machine Translation Customization via Automatic Training Data Selection
from the Web [97.98885151955467]
We describe an approach for customizing machine translation systems on specific domains.
We select data similar to the target customer data to train neural translation models.
Finally, we train MT models on our automatically selected data, obtaining a system specialized to the target domain.
arXiv Detail & Related papers (2021-02-20T03:29:41Z) - A Technical Report: BUT Speech Translation Systems [2.9327503320877457]
The paper describes the BUT's speech translation systems.
The systems are English$longrightarrow$German offline speech translation systems.
A large degradation is observed when translating ASR hypothesis compared to the oracle input text.
arXiv Detail & Related papers (2020-10-22T10:52:31Z) - Cascaded Models With Cyclic Feedback For Direct Speech Translation [14.839931533868176]
We present a technique that allows cascades of automatic speech recognition (ASR) and machine translation (MT) to exploit in-domain direct speech translation data.
A comparison to end-to-end speech translation using components of identical architecture and the same data shows gains of up to 3.8 BLEU points on LibriVoxDeEn and up to 5.1 BLEU points on CoVoST for German-to-English speech translation.
arXiv Detail & Related papers (2020-10-21T17:18:51Z) - Improving Cross-Lingual Transfer Learning for End-to-End Speech
Recognition with Speech Translation [63.16500026845157]
We introduce speech-to-text translation as an auxiliary task to incorporate additional knowledge of the target language.
We show that training ST with human translations is not necessary.
Even with pseudo-labels from low-resource MT (200K examples), ST-enhanced transfer brings up to 8.9% WER reduction to direct transfer.
arXiv Detail & Related papers (2020-06-09T19:34:11Z) - Jointly Trained Transformers models for Spoken Language Translation [2.3886615435250302]
This work trains SLT systems with ASR objective as an auxiliary loss and both the networks are connected through neural hidden representations.
This architecture has improved from BLEU from 36.8 to 44.5.
All the experiments are reported on English-Portuguese speech translation task using How2 corpus.
arXiv Detail & Related papers (2020-04-25T11:28:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.