Fine-grained Human Evaluation of Transformer and Recurrent Approaches to
Neural Machine Translation for English-to-Chinese
- URL: http://arxiv.org/abs/2006.08297v1
- Date: Mon, 15 Jun 2020 11:47:00 GMT
- Title: Fine-grained Human Evaluation of Transformer and Recurrent Approaches to
Neural Machine Translation for English-to-Chinese
- Authors: Yuying Ye, Antonio Toral
- Abstract summary: We develop an error taxonomy compliant with the Multidimensional Quality Metrics (MQM) framework.
We then conduct an error annotation using this customised error taxonomy on the output of state-of-the-art recurrent- and Transformer-based MT systems.
The resulting annotation shows that, compared to the best recurrent system, the best Transformer system results in a 31% reduction of the total number of errors.
- Score: 3.3453601632404073
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This research presents a fine-grained human evaluation to compare the
Transformer and recurrent approaches to neural machine translation (MT), on the
translation direction English-to-Chinese. To this end, we develop an error
taxonomy compliant with the Multidimensional Quality Metrics (MQM) framework
that is customised to the relevant phenomena of this translation direction. We
then conduct an error annotation using this customised error taxonomy on the
output of state-of-the-art recurrent- and Transformer-based MT systems on a
subset of WMT2019's news test set. The resulting annotation shows that,
compared to the best recurrent system, the best Transformer system results in a
31% reduction of the total number of errors and it produced significantly less
errors in 10 out of 22 error categories. We also note that two of the systems
evaluated do not produce any error for a category that was relevant for this
translation direction prior to the advent of NMT systems: Chinese classifiers.
Related papers
- OTTAWA: Optimal TransporT Adaptive Word Aligner for Hallucination and Omission Translation Errors Detection [36.59354124910338]
Ottawa is a word aligner specifically designed to enhance the detection of hallucinations and omissions in Machine Translation systems.
Our approach yields competitive results compared to state-of-the-art methods across 18 language pairs on the HalOmi benchmark.
arXiv Detail & Related papers (2024-06-04T03:00:55Z) - Human Evaluation of English--Irish Transformer-Based NMT [2.648836772989769]
Best-performing Transformer system significantly reduces both accuracy and errors when compared with an RNN-based model.
When benchmarked against Google Translate, our translation engines demonstrated significant improvements.
arXiv Detail & Related papers (2024-03-04T11:45:46Z) - The Devil is in the Errors: Leveraging Large Language Models for
Fine-grained Machine Translation Evaluation [93.01964988474755]
AutoMQM is a prompting technique which asks large language models to identify and categorize errors in translations.
We study the impact of labeled data through in-context learning and finetuning.
We then evaluate AutoMQM with PaLM-2 models, and we find that it improves performance compared to just prompting for scores.
arXiv Detail & Related papers (2023-08-14T17:17:21Z) - BLEURT Has Universal Translations: An Analysis of Automatic Metrics by
Minimum Risk Training [64.37683359609308]
In this study, we analyze various mainstream and cutting-edge automatic metrics from the perspective of their guidance for training machine translation systems.
We find that certain metrics exhibit robustness defects, such as the presence of universal adversarial translations in BLEURT and BARTScore.
In-depth analysis suggests two main causes of these robustness deficits: distribution biases in the training datasets, and the tendency of the metric paradigm.
arXiv Detail & Related papers (2023-07-06T16:59:30Z) - Categorizing Semantic Representations for Neural Machine Translation [53.88794787958174]
We introduce categorization to the source contextualized representations.
The main idea is to enhance generalization by reducing sparsity and overfitting.
Experiments on a dedicated MT dataset show that our method reduces compositional generalization error rates by 24% error reduction.
arXiv Detail & Related papers (2022-10-13T04:07:08Z) - Minimum Bayes Risk Decoding with Neural Metrics of Translation Quality [16.838064121696274]
This work applies Minimum Bayes Risk decoding to optimize diverse automated metrics of translation quality.
Experiments show that the combination of a neural translation model with a neural reference-based metric, BLEURT, results in significant improvement in automatic and human evaluations.
arXiv Detail & Related papers (2021-11-17T20:48:02Z) - Bayesian Transformer Language Models for Speech Recognition [59.235405107295655]
State-of-the-art neural language models (LMs) represented by Transformers are highly complex.
This paper proposes a full Bayesian learning framework for Transformer LM estimation.
arXiv Detail & Related papers (2021-02-09T10:55:27Z) - Machine Translation of Novels in the Age of Transformer [1.6453685972661827]
We build a machine translation system tailored to the literary domain, specifically to novels, based on the state-of-the-art architecture in neural MT (NMT), the Transformer, for the translation direction English-to-Catalan.
We compare this MT system against three other systems (two domain-specific systems under the recurrent and phrase-based paradigms and a popular generic on-line system) on three evaluations.
As expected, the domain-specific Transformer-based system outperformed the three other systems in all the three evaluations conducted, in all cases by a large margin.
arXiv Detail & Related papers (2020-11-30T16:51:08Z) - On the Limitations of Cross-lingual Encoders as Exposed by
Reference-Free Machine Translation Evaluation [55.02832094101173]
Evaluation of cross-lingual encoders is usually performed either via zero-shot cross-lingual transfer in supervised downstream tasks or via unsupervised cross-lingual similarity.
This paper concerns ourselves with reference-free machine translation (MT) evaluation where we directly compare source texts to (sometimes low-quality) system translations.
We systematically investigate a range of metrics based on state-of-the-art cross-lingual semantic representations obtained with pretrained M-BERT and LASER.
We find that they perform poorly as semantic encoders for reference-free MT evaluation and identify their two key limitations.
arXiv Detail & Related papers (2020-05-03T22:10:23Z) - Explicit Reordering for Neural Machine Translation [50.70683739103066]
In Transformer-based neural machine translation (NMT), the positional encoding mechanism helps the self-attention networks to learn the source representation with order dependency.
We propose a novel reordering method to explicitly model this reordering information for the Transformer-based NMT.
The empirical results on the WMT14 English-to-German, WAT ASPEC Japanese-to-English, and WMT17 Chinese-to-English translation tasks show the effectiveness of the proposed approach.
arXiv Detail & Related papers (2020-04-08T05:28:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.