Automatic Classification of Human Translation and Machine Translation: A
Study from the Perspective of Lexical Diversity
- URL: http://arxiv.org/abs/2105.04616v1
- Date: Mon, 10 May 2021 18:55:04 GMT
- Title: Automatic Classification of Human Translation and Machine Translation: A
Study from the Perspective of Lexical Diversity
- Authors: Yingxue Fu, Mark-Jan Nederhof
- Abstract summary: We show that machine translation and human translation can be classified with an accuracy above chance level.
The classification accuracy of machine translation is much higher than human translation.
- Score: 1.5229257192293197
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: By using a trigram model and fine-tuning a pretrained BERT model for sequence
classification, we show that machine translation and human translation can be
classified with an accuracy above chance level, which suggests that machine
translation and human translation are different in a systematic way. The
classification accuracy of machine translation is much higher than of human
translation. We show that this may be explained by the difference in lexical
diversity between machine translation and human translation. If machine
translation has independent patterns from human translation, automatic metrics
which measure the deviation of machine translation from human translation may
conflate difference with quality. Our experiment with two different types of
automatic metrics shows correlation with the result of the classification task.
Therefore, we suggest the difference in lexical diversity between machine
translation and human translation be given more attention in machine
translation evaluation.
Related papers
- An Empirical Study on the Robustness of Massively Multilingual Neural Machine Translation [40.08063412966712]
Massively multilingual neural machine translation (MMNMT) has been proven to enhance the translation quality of low-resource languages.
We create a robustness evaluation benchmark dataset for Indonesian-Chinese translation.
This dataset is automatically translated into Chinese using four NLLB-200 models of different sizes.
arXiv Detail & Related papers (2024-05-13T12:01:54Z) - Incorporating Human Translator Style into English-Turkish Literary
Machine Translation [0.26168876987285306]
We develop machine translation models that take into account the stylistic features of translators.
We show that the human translator style can be highly recreated in the target machine translations.
arXiv Detail & Related papers (2023-07-21T09:39:50Z) - Measuring Sentiment Bias in Machine Translation [1.567333808864147]
Biases induced to text by generative models have become an increasingly large topic in recent years.
We compare three open access machine translation models for five different languages on two parallel corpora.
Though our statistic test indicate shifts in the label probability distributions, we find none that appears consistent enough to assume a bias induced by the translation process.
arXiv Detail & Related papers (2023-06-12T14:40:29Z) - Revisiting Machine Translation for Cross-lingual Classification [91.43729067874503]
Most research in the area focuses on the multilingual models rather than the Machine Translation component.
We show that, by using a stronger MT system and mitigating the mismatch between training on original text and running inference on machine translated text, translate-test can do substantially better than previously assumed.
arXiv Detail & Related papers (2023-05-23T16:56:10Z) - Decomposed Prompting for Machine Translation Between Related Languages
using Large Language Models [55.35106713257871]
We introduce DecoMT, a novel approach of few-shot prompting that decomposes the translation process into a sequence of word chunk translations.
We show that DecoMT outperforms the strong few-shot prompting BLOOM model with an average improvement of 8 chrF++ scores across the examined languages.
arXiv Detail & Related papers (2023-05-22T14:52:47Z) - The Best of Both Worlds: Combining Human and Machine Translations for
Multilingual Semantic Parsing with Active Learning [50.320178219081484]
We propose an active learning approach that exploits the strengths of both human and machine translations.
An ideal utterance selection can significantly reduce the error and bias in the translated data.
arXiv Detail & Related papers (2023-05-22T05:57:47Z) - Extrinsic Evaluation of Machine Translation Metrics [78.75776477562087]
It is unclear if automatic metrics are reliable at distinguishing good translations from bad translations at the sentence level.
We evaluate the segment-level performance of the most widely used MT metrics (chrF, COMET, BERTScore, etc.) on three downstream cross-lingual tasks.
Our experiments demonstrate that all metrics exhibit negligible correlation with the extrinsic evaluation of the downstream outcomes.
arXiv Detail & Related papers (2022-12-20T14:39:58Z) - Using CollGram to Compare Formulaic Language in Human and Neural Machine
Translation [0.0]
A comparison of formulaic sequences in human and neural machine translation of quality newspaper articles shows that neural machine translations contain less lower-frequency, but strongly-associated formulaic sequences.
These differences were statistically significant and the effect sizes were almost always medium or large.
arXiv Detail & Related papers (2021-07-08T06:30:35Z) - Decoding and Diversity in Machine Translation [90.33636694717954]
We characterize differences between cost diversity paid for the BLEU scores enjoyed by NMT.
Our study implicates search as a salient source of known bias when translating gender pronouns.
arXiv Detail & Related papers (2020-11-26T21:09:38Z) - Computer Assisted Translation with Neural Quality Estimation and
Automatic Post-Editing [18.192546537421673]
We propose an end-to-end deep learning framework of the quality estimation and automatic post-editing of the machine translation output.
Our goal is to provide error correction suggestions and to further relieve the burden of human translators through an interpretable model.
arXiv Detail & Related papers (2020-09-19T00:29:00Z) - A Set of Recommendations for Assessing Human-Machine Parity in Language
Translation [87.72302201375847]
We reassess Hassan et al.'s investigation into Chinese to English news translation.
We show that the professional human translations contained significantly fewer errors.
arXiv Detail & Related papers (2020-04-03T17:49:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.