Related papers: A Set of Recommendations for Assessing Human-Machine Parity in Language Translation

A Set of Recommendations for Assessing Human-Machine Parity in Language Translation

URL: http://arxiv.org/abs/2004.01694v1
Date: Fri, 3 Apr 2020 17:49:56 GMT
Title: A Set of Recommendations for Assessing Human-Machine Parity in Language Translation
Authors: Samuel L\"aubli and Sheila Castilho and Graham Neubig and Rico Sennrich and Qinlan Shen and Antonio Toral
Abstract summary: We reassess Hassan et al.'s investigation into Chinese to English news translation. We show that the professional human translations contained significantly fewer errors.
Score: 87.72302201375847
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The quality of machine translation has increased remarkably over the past years, to the degree that it was found to be indistinguishable from professional human translation in a number of empirical investigations. We reassess Hassan et al.'s 2018 investigation into Chinese to English news translation, showing that the finding of human-machine parity was owed to weaknesses in the evaluation design - which is currently considered best practice in the field. We show that the professional human translations contained significantly fewer errors, and that perceived quality in human evaluation depends on the choice of raters, the availability of linguistic context, and the creation of reference translations. Our results call for revisiting current best practices to assess strong machine translation systems in general and human-machine parity in particular, for which we offer a set of recommendations based on our empirical findings.

Related papers

BiVert: Bidirectional Vocabulary Evaluation using Relations for Machine Translation [4.651581292181871]
We propose a bidirectional semantic-based evaluation method designed to assess the sense distance of the translation from the source text. This approach employs the comprehensive multilingual encyclopedic dictionary BabelNet. Factual analysis shows a strong correlation between the average evaluation scores generated by our method and the human assessments across various machine translation systems for English-German language pair.
arXiv Detail & Related papers (2024-03-06T08:02:21Z)
Advancing Translation Preference Modeling with RLHF: A Step Towards Cost-Effective Solution [57.42593422091653]
We explore leveraging reinforcement learning with human feedback to improve translation quality. A reward model with strong language capabilities can more sensitively learn the subtle differences in translation quality.
arXiv Detail & Related papers (2024-02-18T09:51:49Z)
Improving Machine Translation with Human Feedback: An Exploration of Quality Estimation as a Reward Model [75.66013048128302]
In this work, we investigate the potential of employing the QE model as the reward model to predict human preferences for feedback training. We first identify the overoptimization problem during QE-based feedback training, manifested as an increase in reward while translation quality declines. To address the problem, we adopt a simple yet effective method that uses rules to detect the incorrect translations and assigns a penalty term to the reward scores of them.
arXiv Detail & Related papers (2024-01-23T16:07:43Z)
Iterative Translation Refinement with Large Language Models [25.90607157524168]
We propose iteratively prompting a large language model to self-correct a translation. We also discuss the challenges in evaluation and relation to human performance and translationese.
arXiv Detail & Related papers (2023-06-06T16:51:03Z)
The Best of Both Worlds: Combining Human and Machine Translations for Multilingual Semantic Parsing with Active Learning [50.320178219081484]
We propose an active learning approach that exploits the strengths of both human and machine translations. An ideal utterance selection can significantly reduce the error and bias in the translated data.
arXiv Detail & Related papers (2023-05-22T05:57:47Z)
Competency-Aware Neural Machine Translation: Can Machine Translation Know its Own Translation Quality? [61.866103154161884]
Neural machine translation (NMT) is often criticized for failures that happen without awareness. We propose a novel competency-aware NMT by extending conventional NMT with a self-estimator. We show that the proposed method delivers outstanding performance on quality estimation.
arXiv Detail & Related papers (2022-11-25T02:39:41Z)
Consistent Human Evaluation of Machine Translation across Language Pairs [21.81895199744468]
We propose a new metric called XSTS that is more focused on semantic equivalence and a cross-lingual calibration method. We demonstrate the effectiveness of these novel contributions in large scale evaluation studies across up to 14 language pairs.
arXiv Detail & Related papers (2022-05-17T17:57:06Z)
ChrEnTranslate: Cherokee-English Machine Translation Demo with Quality Estimation and Corrective Feedback [70.5469946314539]
ChrEnTranslate is an online machine translation demonstration system for translation between English and an endangered language Cherokee. It supports both statistical and neural translation models as well as provides quality estimation to inform users of reliability.
arXiv Detail & Related papers (2021-07-30T17:58:54Z)
Experts, Errors, and Context: A Large-Scale Study of Human Evaluation for Machine Translation [19.116396693370422]
We propose an evaluation methodology grounded in explicit error analysis, based on the Multidimensional Quality Metrics framework. We carry out the largest MQM research study to date, scoring the outputs of top systems from the WMT 2020 shared task in two language pairs. We analyze the resulting data extensively, finding among other results a substantially different ranking of evaluated systems from the one established by the WMT crowd workers.
arXiv Detail & Related papers (2021-04-29T16:42:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.