Unsupervised Quality Estimation for Neural Machine Translation
- URL: http://arxiv.org/abs/2005.10608v2
- Date: Mon, 20 Jul 2020 08:37:22 GMT
- Title: Unsupervised Quality Estimation for Neural Machine Translation
- Authors: Marina Fomicheva, Shuo Sun, Lisa Yankovskaya, Fr\'ed\'eric Blain,
Francisco Guzm\'an, Mark Fishel, Nikolaos Aletras, Vishrav Chaudhary, Lucia
Specia
- Abstract summary: Existing approaches require large amounts of expert annotated data, computation and time for training.
We devise an unsupervised approach to QE where no training or access to additional resources besides the MT system itself is required.
We achieve very good correlation with human judgments of quality, rivalling state-of-the-art supervised QE models.
- Score: 63.38918378182266
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Quality Estimation (QE) is an important component in making Machine
Translation (MT) useful in real-world applications, as it is aimed to inform
the user on the quality of the MT output at test time. Existing approaches
require large amounts of expert annotated data, computation and time for
training. As an alternative, we devise an unsupervised approach to QE where no
training or access to additional resources besides the MT system itself is
required. Different from most of the current work that treats the MT system as
a black box, we explore useful information that can be extracted from the MT
system as a by-product of translation. By employing methods for uncertainty
quantification, we achieve very good correlation with human judgments of
quality, rivalling state-of-the-art supervised QE models. To evaluate our
approach we collect the first dataset that enables work on both black-box and
glass-box approaches to QE.
Related papers
- Towards Zero-Shot Multimodal Machine Translation [64.9141931372384]
We propose a method to bypass the need for fully supervised data to train multimodal machine translation systems.
Our method, called ZeroMMT, consists in adapting a strong text-only machine translation (MT) model by training it on a mixture of two objectives.
To prove that our method generalizes to languages with no fully supervised training data available, we extend the CoMMuTE evaluation dataset to three new languages: Arabic, Russian and Chinese.
arXiv Detail & Related papers (2024-07-18T15:20:31Z) - Quality Estimation with $k$-nearest Neighbors and Automatic Evaluation for Model-specific Quality Estimation [14.405862891194344]
We propose a model-specific, unsupervised QE approach, termed $k$NN-QE, that extracts information from the MT model's training data using $k$-nearest neighbors.
Measuring the performance of model-specific QE is not straightforward, since they provide quality scores on their own MT output.
We propose an automatic evaluation method that uses quality scores from reference-based metrics as gold standard instead of human-generated ones.
arXiv Detail & Related papers (2024-04-27T23:52:51Z) - Multi-Dimensional Machine Translation Evaluation: Model Evaluation and Resource for Korean [7.843029855730508]
We develop a 1200-sentence MQM evaluation benchmark for the language pair English-Korean.
We find that reference-free setup outperforms its counterpart in the style dimension.
Overall, RemBERT emerges as the most promising model.
arXiv Detail & Related papers (2024-03-19T12:02:38Z) - The Devil is in the Errors: Leveraging Large Language Models for
Fine-grained Machine Translation Evaluation [93.01964988474755]
AutoMQM is a prompting technique which asks large language models to identify and categorize errors in translations.
We study the impact of labeled data through in-context learning and finetuning.
We then evaluate AutoMQM with PaLM-2 models, and we find that it improves performance compared to just prompting for scores.
arXiv Detail & Related papers (2023-08-14T17:17:21Z) - Perturbation-based QE: An Explainable, Unsupervised Word-level Quality
Estimation Method for Blackbox Machine Translation [12.376309678270275]
Perturbation-based QE works simply by analyzing MT system output on perturbed input source sentences.
Our approach is better at detecting gender bias and word-sense-disambiguation errors in translation than supervised QE.
arXiv Detail & Related papers (2023-05-12T13:10:57Z) - Extrinsic Evaluation of Machine Translation Metrics [78.75776477562087]
It is unclear if automatic metrics are reliable at distinguishing good translations from bad translations at the sentence level.
We evaluate the segment-level performance of the most widely used MT metrics (chrF, COMET, BERTScore, etc.) on three downstream cross-lingual tasks.
Our experiments demonstrate that all metrics exhibit negligible correlation with the extrinsic evaluation of the downstream outcomes.
arXiv Detail & Related papers (2022-12-20T14:39:58Z) - Competency-Aware Neural Machine Translation: Can Machine Translation
Know its Own Translation Quality? [61.866103154161884]
Neural machine translation (NMT) is often criticized for failures that happen without awareness.
We propose a novel competency-aware NMT by extending conventional NMT with a self-estimator.
We show that the proposed method delivers outstanding performance on quality estimation.
arXiv Detail & Related papers (2022-11-25T02:39:41Z) - Measuring Uncertainty in Translation Quality Evaluation (TQE) [62.997667081978825]
This work carries out motivated research to correctly estimate the confidence intervals citeBrown_etal2001Interval depending on the sample size of the translated text.
The methodology we applied for this work is from Bernoulli Statistical Distribution Modelling (BSDM) and Monte Carlo Sampling Analysis (MCSA)
arXiv Detail & Related papers (2021-11-15T12:09:08Z) - Pushing the Right Buttons: Adversarial Evaluation of Quality Estimation [25.325624543852086]
We propose a general methodology for adversarial testing of Quality Estimation for Machine Translation (MT) systems.
We show that despite a high correlation with human judgements achieved by the recent SOTA, certain types of meaning errors are still problematic for QE to detect.
Second, we show that on average, the ability of a given model to discriminate between meaning-preserving and meaning-altering perturbations is predictive of its overall performance.
arXiv Detail & Related papers (2021-09-22T17:32:18Z) - Beyond Glass-Box Features: Uncertainty Quantification Enhanced Quality
Estimation for Neural Machine Translation [14.469503103015668]
We propose a framework to fuse the feature engineering of uncertainty quantification into a pre-trained cross-lingual language model to predict the translation quality.
Experiment results show that our method achieves state-of-the-art performances on the datasets of WMT 2020 QE shared task.
arXiv Detail & Related papers (2021-09-15T08:05:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.