Evaluating Optimal Reference Translations
- URL: http://arxiv.org/abs/2311.16787v2
- Date: Fri, 8 Mar 2024 12:21:12 GMT
- Title: Evaluating Optimal Reference Translations
- Authors: Vil\'em Zouhar, V\v{e}ra Kloudov\'a, Martin Popel, Ond\v{r}ej Bojar
- Abstract summary: We propose a methodology for creating more reliable document-level human reference translations.
We evaluate the obtained document-level optimal reference translations in comparison with "standard" ones.
- Score: 4.956416618428049
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The overall translation quality reached by current machine translation (MT)
systems for high-resourced language pairs is remarkably good. Standard methods
of evaluation are not suitable nor intended to uncover the many translation
errors and quality deficiencies that still persist. Furthermore, the quality of
standard reference translations is commonly questioned and comparable quality
levels have been reached by MT alone in several language pairs. Navigating
further research in these high-resource settings is thus difficult. In this
article, we propose a methodology for creating more reliable document-level
human reference translations, called "optimal reference translations," with the
simple aim to raise the bar of what should be deemed "human translation
quality." We evaluate the obtained document-level optimal reference
translations in comparison with "standard" ones, confirming a significant
quality increase and also documenting the relationship between evaluation and
translation editing.
Related papers
- Can Automatic Metrics Assess High-Quality Translations? [28.407966066693334]
We show that current metrics are insensitive to nuanced differences in translation quality.
This effect is most pronounced when the quality is high and the variance among alternatives is low.
Using the MQM framework as the gold standard, we systematically stress-test the ability of current metrics to identify translations with no errors as marked by humans.
arXiv Detail & Related papers (2024-05-28T16:44:02Z) - Advancing Translation Preference Modeling with RLHF: A Step Towards
Cost-Effective Solution [57.42593422091653]
We explore leveraging reinforcement learning with human feedback to improve translation quality.
A reward model with strong language capabilities can more sensitively learn the subtle differences in translation quality.
arXiv Detail & Related papers (2024-02-18T09:51:49Z) - Quality and Quantity of Machine Translation References for Automatic Metrics [4.824118883700288]
Higher-quality references lead to better metric correlations with humans at the segment-level.
The references from vendors of different qualities can be mixed together and improve metric success.
These findings can be used by evaluators of shared tasks when references need to be created under a certain budget.
arXiv Detail & Related papers (2024-01-02T16:51:17Z) - On Search Strategies for Document-Level Neural Machine Translation [51.359400776242786]
Document-level neural machine translation (NMT) models produce a more consistent output across a document.
In this work, we aim to answer the question how to best utilize a context-aware translation model in decoding.
arXiv Detail & Related papers (2023-06-08T11:30:43Z) - Competency-Aware Neural Machine Translation: Can Machine Translation
Know its Own Translation Quality? [61.866103154161884]
Neural machine translation (NMT) is often criticized for failures that happen without awareness.
We propose a novel competency-aware NMT by extending conventional NMT with a self-estimator.
We show that the proposed method delivers outstanding performance on quality estimation.
arXiv Detail & Related papers (2022-11-25T02:39:41Z) - Rethink about the Word-level Quality Estimation for Machine Translation
from Human Judgement [57.72846454929923]
We create a benchmark dataset, emphHJQE, where the expert translators directly annotate poorly translated words.
We propose two tag correcting strategies, namely tag refinement strategy and tree-based annotation strategy, to make the TER-based artificial QE corpus closer to emphHJQE.
The results show our proposed dataset is more consistent with human judgement and also confirm the effectiveness of the proposed tag correcting strategies.
arXiv Detail & Related papers (2022-09-13T02:37:12Z) - A Bayesian approach to translators' reliability assessment [0.0]
We consider the Translation Quality Assessment process as a complex process, considering it from the physics of complex systems point of view.
We build two Bayesian models that parameterise the features involved in the TQA process, namely the translation difficulty, the characteristics of the translators involved in producing the translation and assessing its quality.
We show that reviewers reliability cannot be taken for granted even if they are expert translators.
arXiv Detail & Related papers (2022-03-14T14:29:45Z) - HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using
Professional Post-Editing Towards More Effective MT Evaluation [0.0]
In this work, we introduce HOPE, a task-oriented and human-centric evaluation framework for machine translation output.
It contains only a limited number of commonly occurring error types, and use a scoring model with geometric progression of error penalty points (EPPs) reflecting error severity level to each translation unit.
The approach has several key advantages, such as ability to measure and compare less than perfect MT output from different systems, ability to indicate human perception of quality, immediate estimation of the labor effort required to bring MT output to premium quality, low-cost and faster application, as well as higher IRR.
arXiv Detail & Related papers (2021-12-27T18:47:43Z) - Measuring Uncertainty in Translation Quality Evaluation (TQE) [62.997667081978825]
This work carries out motivated research to correctly estimate the confidence intervals citeBrown_etal2001Interval depending on the sample size of the translated text.
The methodology we applied for this work is from Bernoulli Statistical Distribution Modelling (BSDM) and Monte Carlo Sampling Analysis (MCSA)
arXiv Detail & Related papers (2021-11-15T12:09:08Z) - Document-level Neural Machine Translation with Document Embeddings [82.4684444847092]
This work focuses on exploiting detailed document-level context in terms of multiple forms of document embeddings.
The proposed document-aware NMT is implemented to enhance the Transformer baseline by introducing both global and local document-level clues on the source end.
arXiv Detail & Related papers (2020-09-16T19:43:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.