Robustness of Multi-Source MT to Transcription Errors
- URL: http://arxiv.org/abs/2305.16894v1
- Date: Fri, 26 May 2023 12:54:16 GMT
- Title: Robustness of Multi-Source MT to Transcription Errors
- Authors: Dominik Mach\'a\v{c}ek, Peter Pol\'ak, Ond\v{r}ej Bojar, Raj Dabre
- Abstract summary: In a multilingual scenario, the same content may be available in various languages via simultaneous interpreting, dubbing or subtitling.
We show that on a 10-hour ESIC corpus, the ASR errors in the original English speech and its simultaneous interpreting into German and Czech are mutually independent.
Our results show that multi-source neural machine translation has the potential to be useful in a real-time simultaneous translation setting.
- Score: 9.045660146260467
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automatic speech translation is sensitive to speech recognition errors, but
in a multilingual scenario, the same content may be available in various
languages via simultaneous interpreting, dubbing or subtitling. In this paper,
we hypothesize that leveraging multiple sources will improve translation
quality if the sources complement one another in terms of correct information
they contain. To this end, we first show that on a 10-hour ESIC corpus, the ASR
errors in the original English speech and its simultaneous interpreting into
German and Czech are mutually independent. We then use two sources, English and
German, in a multi-source setting for translation into Czech to establish its
robustness to ASR errors. Furthermore, we observe this robustness when
translating both noisy sources together in a simultaneous translation setting.
Our results show that multi-source neural machine translation has the potential
to be useful in a real-time simultaneous translation setting, thereby
motivating further investigation in this area.
Related papers
- A Data Selection Approach for Enhancing Low Resource Machine Translation Using Cross-Lingual Sentence Representations [0.4499833362998489]
This study focuses on the case of English-Marathi language pairs, where existing datasets are notably noisy.
To mitigate the impact of data quality issues, we propose a data filtering approach based on cross-lingual sentence representations.
Results demonstrate a significant improvement in translation quality over the baseline post-filtering with IndicSBERT.
arXiv Detail & Related papers (2024-09-04T13:49:45Z) - Towards a Deep Understanding of Multilingual End-to-End Speech
Translation [52.26739715012842]
We analyze representations learnt in a multilingual end-to-end speech translation model trained over 22 languages.
We derive three major findings from our analysis.
arXiv Detail & Related papers (2023-10-31T13:50:55Z) - Is Robustness Transferable across Languages in Multilingual Neural
Machine Translation? [45.04661608619081]
We investigate the transferability of robustness across different languages in multilingual neural machine translation.
Our findings demonstrate that the robustness gained in one translation direction can indeed transfer to other translation directions.
arXiv Detail & Related papers (2023-10-31T04:10:31Z) - Automatic Discrimination of Human and Neural Machine Translation in
Multilingual Scenarios [4.631167282648452]
We tackle the task of automatically discriminating between human and machine translations.
We perform experiments in a multilingual setting, considering multiple languages and multilingual pretrained language models.
arXiv Detail & Related papers (2023-05-31T11:41:24Z) - On the Copying Problem of Unsupervised NMT: A Training Schedule with a
Language Discriminator Loss [120.19360680963152]
unsupervised neural machine translation (UNMT) has achieved success in many language pairs.
The copying problem, i.e., directly copying some parts of the input sentence as the translation, is common among distant language pairs.
We propose a simple but effective training schedule that incorporates a language discriminator loss.
arXiv Detail & Related papers (2023-05-26T18:14:23Z) - Decomposed Prompting for Machine Translation Between Related Languages
using Large Language Models [55.35106713257871]
We introduce DecoMT, a novel approach of few-shot prompting that decomposes the translation process into a sequence of word chunk translations.
We show that DecoMT outperforms the strong few-shot prompting BLOOM model with an average improvement of 8 chrF++ scores across the examined languages.
arXiv Detail & Related papers (2023-05-22T14:52:47Z) - ChrEnTranslate: Cherokee-English Machine Translation Demo with Quality
Estimation and Corrective Feedback [70.5469946314539]
ChrEnTranslate is an online machine translation demonstration system for translation between English and an endangered language Cherokee.
It supports both statistical and neural translation models as well as provides quality estimation to inform users of reliability.
arXiv Detail & Related papers (2021-07-30T17:58:54Z) - On the Influence of Machine Translation on Language Origin Obfuscation [0.3437656066916039]
We analyze the ability to detect the source language from the translated output of two widely used commercial machine translation systems.
Evaluations show that the source language can be reconstructed with high accuracy for documents that contain a sufficient amount of translated text.
arXiv Detail & Related papers (2021-06-24T08:33:24Z) - Lost in Interpreting: Speech Translation from Source or Interpreter? [0.0]
We release 10 hours of recordings and transcripts of European Parliament speeches in English, with simultaneous interpreting into Czech and German.
We evaluate quality and latency of speaker-based and interpreter-based spoken translation systems from English to Czech.
arXiv Detail & Related papers (2021-06-17T09:32:49Z) - Continual Mixed-Language Pre-Training for Extremely Low-Resource Neural
Machine Translation [53.22775597051498]
We present a continual pre-training framework on mBART to effectively adapt it to unseen languages.
Results show that our method can consistently improve the fine-tuning performance upon the mBART baseline.
Our approach also boosts the performance on translation pairs where both languages are seen in the original mBART's pre-training.
arXiv Detail & Related papers (2021-05-09T14:49:07Z) - Translation Artifacts in Cross-lingual Transfer Learning [51.66536640084888]
We show that machine translation can introduce subtle artifacts that have a notable impact in existing cross-lingual models.
In natural language inference, translating the premise and the hypothesis independently can reduce the lexical overlap between them.
We also improve the state-of-the-art in XNLI for the translate-test and zero-shot approaches by 4.3 and 2.8 points, respectively.
arXiv Detail & Related papers (2020-04-09T17:54:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.