Mismatching-Aware Unsupervised Translation Quality Estimation For
Low-Resource Languages
- URL: http://arxiv.org/abs/2208.00463v3
- Date: Sun, 3 Mar 2024 11:27:26 GMT
- Title: Mismatching-Aware Unsupervised Translation Quality Estimation For
Low-Resource Languages
- Authors: Fatemeh Azadi, Heshaam Faili, Mohammad Javad Dousti
- Abstract summary: XLMRScore is a cross-lingual counterpart of BERTScore computed via the XLM-RoBERTa (XLMR) model.
We evaluate the proposed method on four low-resource language pairs of the WMT21 QE shared task.
- Score: 6.049660810617423
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Translation Quality Estimation (QE) is the task of predicting the quality of
machine translation (MT) output without any reference. This task has gained
increasing attention as an important component in the practical applications of
MT. In this paper, we first propose XLMRScore, which is a cross-lingual
counterpart of BERTScore computed via the XLM-RoBERTa (XLMR) model. This metric
can be used as a simple unsupervised QE method, nevertheless facing two issues:
firstly, the untranslated tokens leading to unexpectedly high translation
scores, and secondly, the issue of mismatching errors between source and
hypothesis tokens when applying the greedy matching in XLMRScore. To mitigate
these issues, we suggest replacing untranslated words with the unknown token
and the cross-lingual alignment of the pre-trained model to represent aligned
words closer to each other, respectively. We evaluate the proposed method on
four low-resource language pairs of the WMT21 QE shared task, as well as a new
English$\rightarrow$Persian (En-Fa) test dataset introduced in this paper.
Experiments show that our method could get comparable results with the
supervised baseline for two zero-shot scenarios, i.e., with less than 0.01
difference in Pearson correlation, while outperforming unsupervised rivals in
all the low-resource language pairs for above 8%, on average.
Related papers
- QUEST: Quality-Aware Metropolis-Hastings Sampling for Machine Translation [25.165239478219267]
We propose a simple and effective way to avoid over-reliance on noisy quality estimates by using them as the energy function of a Gibbs distribution.
Instead of looking for a mode in the distribution, we generate multiple samples from high-density areas through the Metropolis-Hastings algorithm.
arXiv Detail & Related papers (2024-05-28T17:36:06Z) - Unify word-level and span-level tasks: NJUNLP's Participation for the
WMT2023 Quality Estimation Shared Task [59.46906545506715]
We introduce the NJUNLP team to the WMT 2023 Quality Estimation (QE) shared task.
Our team submitted predictions for the English-German language pair on all two sub-tasks.
Our models achieved the best results in English-German for both word-level and fine-grained error span detection sub-tasks.
arXiv Detail & Related papers (2023-09-23T01:52:14Z) - Extrinsic Evaluation of Machine Translation Metrics [78.75776477562087]
It is unclear if automatic metrics are reliable at distinguishing good translations from bad translations at the sentence level.
We evaluate the segment-level performance of the most widely used MT metrics (chrF, COMET, BERTScore, etc.) on three downstream cross-lingual tasks.
Our experiments demonstrate that all metrics exhibit negligible correlation with the extrinsic evaluation of the downstream outcomes.
arXiv Detail & Related papers (2022-12-20T14:39:58Z) - Alibaba-Translate China's Submission for WMT 2022 Quality Estimation
Shared Task [80.22825549235556]
We present our submission to the sentence-level MQM benchmark at Quality Estimation Shared Task, named UniTE.
Specifically, our systems employ the framework of UniTE, which combined three types of input formats during training with a pre-trained language model.
Results show that our models reach 1st overall ranking in the Multilingual and English-Russian settings, and 2nd overall ranking in English-German and Chinese-English settings.
arXiv Detail & Related papers (2022-10-18T08:55:27Z) - CROP: Zero-shot Cross-lingual Named Entity Recognition with Multilingual
Labeled Sequence Translation [113.99145386490639]
Cross-lingual NER can transfer knowledge between languages via aligned cross-lingual representations or machine translation results.
We propose a Cross-lingual Entity Projection framework (CROP) to enable zero-shot cross-lingual NER.
We adopt a multilingual labeled sequence translation model to project the tagged sequence back to the target language and label the target raw sentence.
arXiv Detail & Related papers (2022-10-13T13:32:36Z) - Rethink about the Word-level Quality Estimation for Machine Translation
from Human Judgement [57.72846454929923]
We create a benchmark dataset, emphHJQE, where the expert translators directly annotate poorly translated words.
We propose two tag correcting strategies, namely tag refinement strategy and tree-based annotation strategy, to make the TER-based artificial QE corpus closer to emphHJQE.
The results show our proposed dataset is more consistent with human judgement and also confirm the effectiveness of the proposed tag correcting strategies.
arXiv Detail & Related papers (2022-09-13T02:37:12Z) - A new approach to calculating BERTScore for automatic assessment of
translation quality [0.0]
The study focuses on the applicability of the BERTScore metric to translation quality assessment at the sentence level for English -> Russian direction.
Experiments were performed with a pre-trained Multilingual BERT as well as with a pair of Monolingual BERT models.
It was demonstrated that such transformation helps to prevent mismatching issue and shown that this approach gives better results than using embeddings of the Multilingual model.
arXiv Detail & Related papers (2022-03-10T19:25:16Z) - Ensemble Fine-tuned mBERT for Translation Quality Estimation [0.0]
In this paper, we discuss our submission to the WMT 2021 QE Shared Task.
Our proposed system is an ensemble of multilingual BERT (mBERT)-based regression models.
It demonstrates comparable performance with respect to the Pearson's correlation and beats the baseline system in MAE/ RMSE for several language pairs.
arXiv Detail & Related papers (2021-09-08T20:13:06Z) - Verdi: Quality Estimation and Error Detection for Bilingual [23.485380293716272]
Verdi is a novel framework for word-level and sentence-level post-editing effort estimation for bilingual corpora.
We exploit the symmetric nature of bilingual corpora and apply model-level dual learning in the NMT predictor.
Our method beats the winner of the competition and outperforms other baseline methods by a great margin.
arXiv Detail & Related papers (2021-05-31T11:04:13Z) - On the Limitations of Cross-lingual Encoders as Exposed by
Reference-Free Machine Translation Evaluation [55.02832094101173]
Evaluation of cross-lingual encoders is usually performed either via zero-shot cross-lingual transfer in supervised downstream tasks or via unsupervised cross-lingual similarity.
This paper concerns ourselves with reference-free machine translation (MT) evaluation where we directly compare source texts to (sometimes low-quality) system translations.
We systematically investigate a range of metrics based on state-of-the-art cross-lingual semantic representations obtained with pretrained M-BERT and LASER.
We find that they perform poorly as semantic encoders for reference-free MT evaluation and identify their two key limitations.
arXiv Detail & Related papers (2020-05-03T22:10:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.