Related papers: QEMind: Alibaba's Submission to the WMT21 Quality Estimation Shared Task

QEMind: Alibaba's Submission to the WMT21 Quality Estimation Shared Task

URL: http://arxiv.org/abs/2112.14890v1
Date: Thu, 30 Dec 2021 02:27:29 GMT
Title: QEMind: Alibaba's Submission to the WMT21 Quality Estimation Shared Task
Authors: Jiayi Wang, Ke Wang, Boxing Chen, Yu Zhao, Weihua Luo, and Yuqi Zhang
Abstract summary: We present our submissions to the WMT 2021 QE shared task. We propose several useful features to evaluate the uncertainty of the translations to build our QE system, named textitQEMind. We show that our multilingual systems outperform the best system in the Direct Assessment QE task of WMT 2020.
Score: 24.668012925628968
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Quality Estimation, as a crucial step of quality control for machine translation, has been explored for years. The goal is to investigate automatic methods for estimating the quality of machine translation results without reference translations. In this year's WMT QE shared task, we utilize the large-scale XLM-Roberta pre-trained model and additionally propose several useful features to evaluate the uncertainty of the translations to build our QE system, named \textit{QEMind}. The system has been applied to the sentence-level scoring task of Direct Assessment and the binary score prediction task of Critical Error Detection. In this paper, we present our submissions to the WMT 2021 QE shared task and an extensive set of experimental results have shown us that our multilingual systems outperform the best system in the Direct Assessment QE task of WMT 2020.

Related papers

Unify word-level and span-level tasks: NJUNLP's Participation for the WMT2023 Quality Estimation Shared Task [59.46906545506715]
We introduce the NJUNLP team to the WMT 2023 Quality Estimation (QE) shared task. Our team submitted predictions for the English-German language pair on all two sub-tasks. Our models achieved the best results in English-German for both word-level and fine-grained error span detection sub-tasks.
arXiv Detail & Related papers (2023-09-23T01:52:14Z)
The Devil is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation [93.01964988474755]
AutoMQM is a prompting technique which asks large language models to identify and categorize errors in translations. We study the impact of labeled data through in-context learning and finetuning. We then evaluate AutoMQM with PaLM-2 models, and we find that it improves performance compared to just prompting for scores.
arXiv Detail & Related papers (2023-08-14T17:17:21Z)
Error Analysis Prompting Enables Human-Like Translation Evaluation in Large Language Models [57.80514758695275]
Using large language models (LLMs) for assessing the quality of machine translation (MT) achieves state-of-the-art performance at the system level. We propose a new prompting method called textbftextttError Analysis Prompting (EAPrompt) This technique emulates the commonly accepted human evaluation framework - Multidimensional Quality Metrics (MQM) and textitproduces explainable and reliable MT evaluations at both the system and segment level.
arXiv Detail & Related papers (2023-03-24T05:05:03Z)
Alibaba-Translate China's Submission for WMT 2022 Quality Estimation Shared Task [80.22825549235556]
We present our submission to the sentence-level MQM benchmark at Quality Estimation Shared Task, named UniTE. Specifically, our systems employ the framework of UniTE, which combined three types of input formats during training with a pre-trained language model. Results show that our models reach 1st overall ranking in the Multilingual and English-Russian settings, and 2nd overall ranking in English-German and Chinese-English settings.
arXiv Detail & Related papers (2022-10-18T08:55:27Z)
QAFactEval: Improved QA-Based Factual Consistency Evaluation for Summarization [116.56171113972944]
We show that carefully choosing the components of a QA-based metric is critical to performance. Our solution improves upon the best-performing entailment-based metric and achieves state-of-the-art performance.
arXiv Detail & Related papers (2021-12-16T00:38:35Z)
Measuring Uncertainty in Translation Quality Evaluation (TQE) [62.997667081978825]
This work carries out motivated research to correctly estimate the confidence intervals citeBrown_etal2001Interval depending on the sample size of the translated text. The methodology we applied for this work is from Bernoulli Statistical Distribution Modelling (BSDM) and Monte Carlo Sampling Analysis (MCSA)
arXiv Detail & Related papers (2021-11-15T12:09:08Z)
Pushing the Right Buttons: Adversarial Evaluation of Quality Estimation [25.325624543852086]
We propose a general methodology for adversarial testing of Quality Estimation for Machine Translation (MT) systems. We show that despite a high correlation with human judgements achieved by the recent SOTA, certain types of meaning errors are still problematic for QE to detect. Second, we show that on average, the ability of a given model to discriminate between meaning-preserving and meaning-altering perturbations is predictive of its overall performance.
arXiv Detail & Related papers (2021-09-22T17:32:18Z)
The JHU-Microsoft Submission for WMT21 Quality Estimation Shared Task [14.629380601429956]
This paper presents the JHU-Microsoft joint submission for WMT 2021 quality estimation shared task. We only participate in Task 2 (post-editing effort estimation) of the shared task, focusing on the target-side word-level quality estimation. We demonstrate the competitiveness of our system compared to the widely adopted OpenKiwi-XLM baseline.
arXiv Detail & Related papers (2021-09-17T19:13:31Z)
Ensemble Fine-tuned mBERT for Translation Quality Estimation [0.0]
In this paper, we discuss our submission to the WMT 2021 QE Shared Task. Our proposed system is an ensemble of multilingual BERT (mBERT)-based regression models. It demonstrates comparable performance with respect to the Pearson's correlation and beats the baseline system in MAE/ RMSE for several language pairs.
arXiv Detail & Related papers (2021-09-08T20:13:06Z)
Ensemble-based Transfer Learning for Low-resource Machine Translation Quality Estimation [1.7188280334580195]
We focus on the Sentence-Level QE Shared Task of the Fifth Conference on Machine Translation (WMT20) We propose an ensemble-based predictor-estimator QE model with transfer learning to overcome such QE data scarcity challenge. We achieve the best performance on the ensemble model combining the models pretrained by individual languages as well as different levels of parallel trained corpus with a Pearson's correlation of 0.298.
arXiv Detail & Related papers (2021-05-17T06:02:17Z)
Unsupervised Quality Estimation for Neural Machine Translation [63.38918378182266]
Existing approaches require large amounts of expert annotated data, computation and time for training. We devise an unsupervised approach to QE where no training or access to additional resources besides the MT system itself is required. We achieve very good correlation with human judgments of quality, rivalling state-of-the-art supervised QE models.
arXiv Detail & Related papers (2020-05-21T12:38:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.