Unsupervised Translation Quality Estimation Exploiting Synthetic Data
  and Pre-trained Multilingual Encoder
        - URL: http://arxiv.org/abs/2311.05117v1
 - Date: Thu, 9 Nov 2023 03:10:42 GMT
 - Title: Unsupervised Translation Quality Estimation Exploiting Synthetic Data
  and Pre-trained Multilingual Encoder
 - Authors: Yuto Kuroda, Atsushi Fujita, Tomoyuki Kajiwara, Takashi Ninomiya
 - Abstract summary: We extensively investigate the usefulness of synthetic TQE data and pre-trained multilingual encoders in unsupervised sentence-level TQE.
Our experiment on WMT20 and WMT21 datasets revealed that this approach can outperform other unsupervised TQE methods on high- and low-resource translation directions.
 - Score: 17.431776840662273
 - License: http://creativecommons.org/licenses/by-sa/4.0/
 - Abstract:   Translation quality estimation (TQE) is the task of predicting translation
quality without reference translations. Due to the enormous cost of creating
training data for TQE, only a few translation directions can benefit from
supervised training. To address this issue, unsupervised TQE methods have been
studied. In this paper, we extensively investigate the usefulness of synthetic
TQE data and pre-trained multilingual encoders in unsupervised sentence-level
TQE, both of which have been proven effective in the supervised training
scenarios. Our experiment on WMT20 and WMT21 datasets revealed that this
approach can outperform other unsupervised TQE methods on high- and
low-resource translation directions in predicting post-editing effort and human
evaluation score, and some zero-resource translation directions in predicting
post-editing effort.
 
       
      
        Related papers
        - Alleviating Distribution Shift in Synthetic Data for Machine Translation   Quality Estimation [55.73341401764367]
We introduce ADSQE, a novel framework for alleviating distribution shift in synthetic QE data.
ADSQE uses references, i.e., translation supervision signals, to guide both the generation and annotation processes.
Experiments demonstrate that ADSQE outperforms SOTA baselines like COMET in both supervised and unsupervised settings.
arXiv  Detail & Related papers  (2025-02-27T10:11:53Z) - Understanding and Addressing the Under-Translation Problem from the   Perspective of Decoding Objective [72.83966378613238]
Under-translation and over-translation remain two challenging problems in state-of-the-art Neural Machine Translation (NMT) systems.
We conduct an in-depth analysis on the underlying cause of under-translation in NMT, providing an explanation from the perspective of decoding objective.
We propose employing the confidence of predicting End Of Sentence (EOS) as a detector for under-translation, and strengthening the confidence-based penalty to penalize candidates with a high risk of under-translation.
arXiv  Detail & Related papers  (2024-05-29T09:25:49Z) - Perturbation-based QE: An Explainable, Unsupervised Word-level Quality
  Estimation Method for Blackbox Machine Translation [12.376309678270275]
Perturbation-based QE works simply by analyzing MT system output on perturbed input source sentences.
Our approach is better at detecting gender bias and word-sense-disambiguation errors in translation than supervised QE.
arXiv  Detail & Related papers  (2023-05-12T13:10:57Z) - Competency-Aware Neural Machine Translation: Can Machine Translation
  Know its Own Translation Quality? [61.866103154161884]
Neural machine translation (NMT) is often criticized for failures that happen without awareness.
We propose a novel competency-aware NMT by extending conventional NMT with a self-estimator.
We show that the proposed method delivers outstanding performance on quality estimation.
arXiv  Detail & Related papers  (2022-11-25T02:39:41Z) - Understanding and Mitigating the Uncertainty in Zero-Shot Translation [92.25357943169601]
We aim to understand and alleviate the off-target issues from the perspective of uncertainty in zero-shot translation.
We propose two lightweight and complementary approaches to denoise the training data for model training.
Our approaches significantly improve the performance of zero-shot translation over strong MNMT baselines.
arXiv  Detail & Related papers  (2022-05-20T10:29:46Z) - Improving Neural Machine Translation by Denoising Training [95.96569884410137]
We present a simple and effective pretraining strategy Denoising Training DoT for neural machine translation.
We update the model parameters with source- and target-side denoising tasks at the early stage and then tune the model normally.
Experiments show DoT consistently improves the neural machine translation performance across 12 bilingual and 16 multilingual directions.
arXiv  Detail & Related papers  (2022-01-19T00:11:38Z) - Measuring Uncertainty in Translation Quality Evaluation (TQE) [62.997667081978825]
This work carries out motivated research to correctly estimate the confidence intervals citeBrown_etal2001Interval depending on the sample size of the translated text.
The methodology we applied for this work is from Bernoulli Statistical Distribution Modelling (BSDM) and Monte Carlo Sampling Analysis (MCSA)
arXiv  Detail & Related papers  (2021-11-15T12:09:08Z) - Exploiting Curriculum Learning in Unsupervised Neural Machine
  Translation [28.75229367700697]
We propose a curriculum learning method to gradually utilize pseudo bi-texts based on their quality from multiple granularities.
 Experimental results on WMT 14 En-Fr, WMT 16 En-De, WMT 16 En-Ro, and LDC En-Zh translation tasks demonstrate that the proposed method achieves consistent improvements with faster convergence speed.
arXiv  Detail & Related papers  (2021-09-23T07:18:06Z) - Verdi: Quality Estimation and Error Detection for Bilingual [23.485380293716272]
Verdi is a novel framework for word-level and sentence-level post-editing effort estimation for bilingual corpora.
We exploit the symmetric nature of bilingual corpora and apply model-level dual learning in the NMT predictor.
Our method beats the winner of the competition and outperforms other baseline methods by a great margin.
arXiv  Detail & Related papers  (2021-05-31T11:04:13Z) - Revisiting Round-Trip Translation for Quality Estimation [0.0]
Quality estimation (QE) is the task of automatically evaluating the quality of translations without human-translated references.
In this paper, we employ semantic embeddings to RTT-based QE.
Our method achieves the highest correlations with human judgments, compared to previous WMT 2019 quality estimation metric task submissions.
arXiv  Detail & Related papers  (2020-04-29T03:20:22Z) - Cross-lingual Supervision Improves Unsupervised Neural Machine
  Translation [97.84871088440102]
We introduce a multilingual unsupervised NMT framework to leverage weakly supervised signals from high-resource language pairs to zero-resource translation directions.
Method significantly improves the translation quality by more than 3 BLEU score on six benchmark unsupervised translation directions.
arXiv  Detail & Related papers  (2020-04-07T05:46:49Z) 
        This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.