Regressive Ensemble for Machine Translation Quality Evaluation
- URL: http://arxiv.org/abs/2109.07242v1
- Date: Wed, 15 Sep 2021 12:22:52 GMT
- Title: Regressive Ensemble for Machine Translation Quality Evaluation
- Authors: Michal \v{S}tef\'anik and V\'it Novotn\'y and Petr Sojka
- Abstract summary: This work introduces a simple regressive ensemble for evaluating machine translation quality.
We evaluate the ensemble using a correlation to expert-based MQM scores of the WMT 2021 Metrics workshop.
In both monolingual and zero-shot cross-lingual settings, we show a significant performance improvement over single metrics.
- Score: 0.4235683368164405
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work introduces a simple regressive ensemble for evaluating machine
translation quality based on a set of novel and established metrics. We
evaluate the ensemble using a correlation to expert-based MQM scores of the WMT
2021 Metrics workshop. In both monolingual and zero-shot cross-lingual
settings, we show a significant performance improvement over single metrics. In
the cross-lingual settings, we also demonstrate that an ensemble approach is
well-applicable to unseen languages. Furthermore, we identify a strong
reference-free baseline that consistently outperforms the commonly-used BLEU
and METEOR measures and significantly improves our ensemble's performance.
Related papers
- Investigating Multilingual Coreference Resolution by Universal
Annotations [11.035051211351213]
We study coreference by examining the ground truth data at different linguistic levels.
We perform an error analysis of the most challenging cases that the SotA system fails to resolve.
We extract features from universal morphosyntactic annotations and integrate these features into a baseline system to assess their potential benefits.
arXiv Detail & Related papers (2023-10-26T18:50:04Z) - Towards Interpretable and Efficient Automatic Reference-Based
Summarization Evaluation [160.07938471250048]
Interpretability and efficiency are two important considerations for the adoption of neural automatic metrics.
We develop strong-performing automatic metrics for reference-based summarization evaluation.
arXiv Detail & Related papers (2023-03-07T02:49:50Z) - Alibaba-Translate China's Submission for WMT 2022 Quality Estimation
Shared Task [80.22825549235556]
We present our submission to the sentence-level MQM benchmark at Quality Estimation Shared Task, named UniTE.
Specifically, our systems employ the framework of UniTE, which combined three types of input formats during training with a pre-trained language model.
Results show that our models reach 1st overall ranking in the Multilingual and English-Russian settings, and 2nd overall ranking in English-German and Chinese-English settings.
arXiv Detail & Related papers (2022-10-18T08:55:27Z) - UniTE: Unified Translation Evaluation [63.58868113074476]
UniTE is the first unified framework engaged with abilities to handle all three evaluation tasks.
We testify our framework on WMT 2019 Metrics and WMT 2020 Quality Estimation benchmarks.
arXiv Detail & Related papers (2022-04-28T08:35:26Z) - On Cross-Lingual Retrieval with Multilingual Text Encoders [51.60862829942932]
We study the suitability of state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks.
We benchmark their performance in unsupervised ad-hoc sentence- and document-level CLIR experiments.
We evaluate multilingual encoders fine-tuned in a supervised fashion (i.e., we learn to rank) on English relevance data in a series of zero-shot language and domain transfer CLIR experiments.
arXiv Detail & Related papers (2021-12-21T08:10:27Z) - QAFactEval: Improved QA-Based Factual Consistency Evaluation for
Summarization [116.56171113972944]
We show that carefully choosing the components of a QA-based metric is critical to performance.
Our solution improves upon the best-performing entailment-based metric and achieves state-of-the-art performance.
arXiv Detail & Related papers (2021-12-16T00:38:35Z) - Ensemble Fine-tuned mBERT for Translation Quality Estimation [0.0]
In this paper, we discuss our submission to the WMT 2021 QE Shared Task.
Our proposed system is an ensemble of multilingual BERT (mBERT)-based regression models.
It demonstrates comparable performance with respect to the Pearson's correlation and beats the baseline system in MAE/ RMSE for several language pairs.
arXiv Detail & Related papers (2021-09-08T20:13:06Z) - Unbabel's Participation in the WMT20 Metrics Shared Task [8.621669980568822]
We present the contribution of the Unbabel team to the WMT 2020 Shared Task on Metrics.
We intend to participate on the segment-level, document-level and system-level tracks on all language pairs.
We illustrate results of our models in these tracks with reference to test sets from the previous year.
arXiv Detail & Related papers (2020-10-29T12:59:44Z) - Learning to Evaluate Translation Beyond English: BLEURT Submissions to
the WMT Metrics 2020 Shared Task [30.889496911261677]
This paper describes our contribution to the WMT 2020 Metrics Shared Task.
We make several submissions based on BLEURT, a metric based on transfer learning.
We show how to combine BLEURT's predictions with those of YiSi and use alternative reference translations to enhance the performance.
arXiv Detail & Related papers (2020-10-08T23:16:26Z) - On the Limitations of Cross-lingual Encoders as Exposed by
Reference-Free Machine Translation Evaluation [55.02832094101173]
Evaluation of cross-lingual encoders is usually performed either via zero-shot cross-lingual transfer in supervised downstream tasks or via unsupervised cross-lingual similarity.
This paper concerns ourselves with reference-free machine translation (MT) evaluation where we directly compare source texts to (sometimes low-quality) system translations.
We systematically investigate a range of metrics based on state-of-the-art cross-lingual semantic representations obtained with pretrained M-BERT and LASER.
We find that they perform poorly as semantic encoders for reference-free MT evaluation and identify their two key limitations.
arXiv Detail & Related papers (2020-05-03T22:10:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.