Comparing Formulaic Language in Human and Machine Translation: Insight
from a Parliamentary Corpus
- URL: http://arxiv.org/abs/2206.10919v1
- Date: Wed, 22 Jun 2022 08:59:10 GMT
- Title: Comparing Formulaic Language in Human and Machine Translation: Insight
from a Parliamentary Corpus
- Authors: Yves Bestgen
- Abstract summary: The text were translated from French to English by three well-known neural machine translation systems: DeepL, Google Translate and Microsoft Translator.
The results confirm the observations on the news corpus, but the differences are less strong.
They suggest that the use of text genres that usually result in more literal translations, such as parliamentary corpora, might be preferable when comparing human and machine translations.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A recent study has shown that, compared to human translations, neural machine
translations contain more strongly-associated formulaic sequences made of
relatively high-frequency words, but far less strongly-associated formulaic
sequences made of relatively rare words. These results were obtained on the
basis of translations of quality newspaper articles in which human translations
can be thought to be not very literal. The present study attempts to replicate
this research using a parliamentary corpus. The text were translated from
French to English by three well-known neural machine translation systems:
DeepL, Google Translate and Microsoft Translator. The results confirm the
observations on the news corpus, but the differences are less strong. They
suggest that the use of text genres that usually result in more literal
translations, such as parliamentary corpora, might be preferable when comparing
human and machine translations. Regarding the differences between the three
neural machine systems, it appears that Google translations contain fewer
highly collocational bigrams, identified by the CollGram technique, than Deepl
and Microsoft translations.
Related papers
- Crossing the Threshold: Idiomatic Machine Translation through Retrieval
Augmentation and Loss Weighting [66.02718577386426]
We provide a simple characterization of idiomatic translation and related issues.
We conduct a synthetic experiment revealing a tipping point at which transformer-based machine translation models correctly default to idiomatic translations.
To improve translation of natural idioms, we introduce two straightforward yet effective techniques.
arXiv Detail & Related papers (2023-10-10T23:47:25Z) - The Best of Both Worlds: Combining Human and Machine Translations for
Multilingual Semantic Parsing with Active Learning [50.320178219081484]
We propose an active learning approach that exploits the strengths of both human and machine translations.
An ideal utterance selection can significantly reduce the error and bias in the translated data.
arXiv Detail & Related papers (2023-05-22T05:57:47Z) - Modeling Target-Side Morphology in Neural Machine Translation: A
Comparison of Strategies [72.56158036639707]
Morphologically rich languages pose difficulties to machine translation.
A large amount of differently inflected word surface forms entails a larger vocabulary.
Some inflected forms of infrequent terms typically do not appear in the training corpus.
Linguistic agreement requires the system to correctly match the grammatical categories between inflected word forms in the output sentence.
arXiv Detail & Related papers (2022-03-25T10:13:20Z) - DEEP: DEnoising Entity Pre-training for Neural Machine Translation [123.6686940355937]
It has been shown that machine translation models usually generate poor translations for named entities that are infrequent in the training corpus.
We propose DEEP, a DEnoising Entity Pre-training method that leverages large amounts of monolingual data and a knowledge base to improve named entity translation accuracy within sentences.
arXiv Detail & Related papers (2021-11-14T17:28:09Z) - Monotonic Simultaneous Translation with Chunk-wise Reordering and
Refinement [38.89496608319392]
We propose an algorithm to reorder and refine the target side of a full sentence translation corpus.
The words/phrases between the source and target sentences are aligned largely monotonically, using word alignment and non-autoregressive neural machine translation.
The proposed approach improves BLEU scores and resulting translations exhibit enhanced monotonicity with source sentences.
arXiv Detail & Related papers (2021-10-18T22:51:21Z) - Using CollGram to Compare Formulaic Language in Human and Neural Machine
Translation [0.0]
A comparison of formulaic sequences in human and neural machine translation of quality newspaper articles shows that neural machine translations contain less lower-frequency, but strongly-associated formulaic sequences.
These differences were statistically significant and the effect sizes were almost always medium or large.
arXiv Detail & Related papers (2021-07-08T06:30:35Z) - It's Easier to Translate out of English than into it: Measuring Neural
Translation Difficulty by Cross-Mutual Information [90.35685796083563]
Cross-mutual information (XMI) is an asymmetric information-theoretic metric of machine translation difficulty.
XMI exploits the probabilistic nature of most neural machine translation models.
We present the first systematic and controlled study of cross-lingual translation difficulties using modern neural translation systems.
arXiv Detail & Related papers (2020-05-05T17:38:48Z) - Knowledge Distillation for Multilingual Unsupervised Neural Machine
Translation [61.88012735215636]
Unsupervised neural machine translation (UNMT) has recently achieved remarkable results for several language pairs.
UNMT can only translate between a single language pair and cannot produce translation results for multiple language pairs at the same time.
In this paper, we empirically introduce a simple method to translate between thirteen languages using a single encoder and a single decoder.
arXiv Detail & Related papers (2020-04-21T17:26:16Z) - Neural Machine Translation System of Indic Languages -- An Attention
based Approach [0.5139874302398955]
In India, almost all the languages are originated from their ancestral language - Sanskrit.
In this paper, we have presented the neural machine translation system (NMT) that can efficiently translate Indic languages like Hindi and Gujarati.
arXiv Detail & Related papers (2020-02-02T07:15:18Z) - Urdu-English Machine Transliteration using Neural Networks [0.0]
We present transliteration technique based on Expectation Maximization (EM) which is un-supervised and language independent.
System learns the pattern and out-of-vocabulary words from parallel corpus and there is no need to train it on transliteration corpus explicitly.
arXiv Detail & Related papers (2020-01-12T17:30:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.