A Case Study on Contextual Machine Translation in a Professional Scenario of Subtitling
- URL: http://arxiv.org/abs/2407.00108v1
- Date: Thu, 27 Jun 2024 11:20:14 GMT
- Title: A Case Study on Contextual Machine Translation in a Professional Scenario of Subtitling
- Authors: Sebastian Vincent, Charlotte Prescott, Chris Bayliss, Chris Oakley, Carolina Scarton,
- Abstract summary: We report on an industrial case study carried out to investigate the benefit of machine translation (MT) in a professional scenario of translating TV subtitles.
We found that post-editors marked significantly fewer context-related errors when correcting the outputs of MTCue, the context-aware model.
We also present the results of a survey of the employed post-editors, which highlights contextual inadequacy as a significant gap consistently observed in MT.
- Score: 3.925328332747599
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Incorporating extra-textual context such as film metadata into the machine translation (MT) pipeline can enhance translation quality, as indicated by automatic evaluation in recent work. However, the positive impact of such systems in industry remains unproven. We report on an industrial case study carried out to investigate the benefit of MT in a professional scenario of translating TV subtitles with a focus on how leveraging extra-textual context impacts post-editing. We found that post-editors marked significantly fewer context-related errors when correcting the outputs of MTCue, the context-aware model, as opposed to non-contextual models. We also present the results of a survey of the employed post-editors, which highlights contextual inadequacy as a significant gap consistently observed in MT. Our findings strengthen the motivation for further work within fully contextual MT.
Related papers
- Understanding and Addressing the Under-Translation Problem from the Perspective of Decoding Objective [72.83966378613238]
Under-translation and over-translation remain two challenging problems in state-of-the-art Neural Machine Translation (NMT) systems.
We conduct an in-depth analysis on the underlying cause of under-translation in NMT, providing an explanation from the perspective of decoding objective.
We propose employing the confidence of predicting End Of Sentence (EOS) as a detector for under-translation, and strengthening the confidence-based penalty to penalize candidates with a high risk of under-translation.
arXiv Detail & Related papers (2024-05-29T09:25:49Z) - Context-aware Neural Machine Translation for English-Japanese Business
Scene Dialogues [14.043741721036543]
This paper explores how context-awareness can improve the performance of the current Neural Machine Translation (NMT) models for English-Japanese business dialogues translation.
We propose novel context tokens encoding extra-sentential information, such as speaker turn and scene type.
We find that models leverage both preceding sentences and extra-sentential context (with CXMI increasing with context size) and we provide a more focused analysis on honorifics translation.
arXiv Detail & Related papers (2023-11-20T18:06:03Z) - Discourse Centric Evaluation of Machine Translation with a Densely
Annotated Parallel Corpus [82.07304301996562]
This paper presents a new dataset with rich discourse annotations, built upon the large-scale parallel corpus BWB introduced in Jiang et al.
We investigate the similarities and differences between the discourse structures of source and target languages.
We discover that MT outputs differ fundamentally from human translations in terms of their latent discourse structures.
arXiv Detail & Related papers (2023-05-18T17:36:41Z) - Tackling Ambiguity with Images: Improved Multimodal Machine Translation
and Contrastive Evaluation [72.6667341525552]
We present a new MMT approach based on a strong text-only MT model, which uses neural adapters and a novel guided self-attention mechanism.
We also introduce CoMMuTE, a Contrastive Multimodal Translation Evaluation set of ambiguous sentences and their possible translations.
Our approach obtains competitive results compared to strong text-only models on standard English-to-French, English-to-German and English-to-Czech benchmarks.
arXiv Detail & Related papers (2022-12-20T10:18:18Z) - Supervised Visual Attention for Simultaneous Multimodal Machine
Translation [47.18251159303909]
We propose the first Transformer-based simultaneous machine translation (MMT) architecture.
We extend this model with an auxiliary supervision signal that guides its visual attention mechanism using labelled phrase-region alignments.
Our results show that supervised visual attention consistently improves the translation quality of the MMT models.
arXiv Detail & Related papers (2022-01-23T17:25:57Z) - When Does Translation Require Context? A Data-driven, Multilingual
Exploration [71.43817945875433]
proper handling of discourse significantly contributes to the quality of machine translation (MT)
Recent works in context-aware MT attempt to target a small set of discourse phenomena during evaluation.
We develop the Multilingual Discourse-Aware benchmark, a series of taggers that identify and evaluate model performance on discourse phenomena.
arXiv Detail & Related papers (2021-09-15T17:29:30Z) - Contextual Neural Machine Translation Improves Translation of Cataphoric
Pronouns [50.245845110446496]
We investigate the effect of future sentences as context by comparing the performance of a contextual NMT model trained with the future context to the one trained with the past context.
Our experiments and evaluation, using generic and pronoun-focused automatic metrics, show that the use of future context achieves significant improvements over the context-agnostic Transformer.
arXiv Detail & Related papers (2020-04-21T10:45:48Z) - When Does Unsupervised Machine Translation Work? [23.690875724726908]
We conduct an empirical evaluation of unsupervised machine translation (MT) using dissimilar language pairs, dissimilar domains, diverse datasets, and authentic low-resource languages.
We find that performance rapidly deteriorates when source and target corpora are from different domains.
We additionally find that unsupervised MT performance declines when source and target languages use different scripts, and observe very poor performance on authentic low-resource language pairs.
arXiv Detail & Related papers (2020-04-12T00:57:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.