Ensemble-based Transfer Learning for Low-resource Machine Translation
Quality Estimation
- URL: http://arxiv.org/abs/2105.07622v1
- Date: Mon, 17 May 2021 06:02:17 GMT
- Title: Ensemble-based Transfer Learning for Low-resource Machine Translation
Quality Estimation
- Authors: Ting-Wei Wu, Yung-An Hsieh, Yi-Chieh Liu
- Abstract summary: We focus on the Sentence-Level QE Shared Task of the Fifth Conference on Machine Translation (WMT20)
We propose an ensemble-based predictor-estimator QE model with transfer learning to overcome such QE data scarcity challenge.
We achieve the best performance on the ensemble model combining the models pretrained by individual languages as well as different levels of parallel trained corpus with a Pearson's correlation of 0.298.
- Score: 1.7188280334580195
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Quality Estimation (QE) of Machine Translation (MT) is a task to estimate the
quality scores for given translation outputs from an unknown MT system.
However, QE scores for low-resource languages are usually intractable and hard
to collect. In this paper, we focus on the Sentence-Level QE Shared Task of the
Fifth Conference on Machine Translation (WMT20), but in a more challenging
setting. We aim to predict QE scores of given translation outputs when barely
none of QE scores of that paired languages are given during training. We
propose an ensemble-based predictor-estimator QE model with transfer learning
to overcome such QE data scarcity challenge by leveraging QE scores from other
miscellaneous languages and translation results of targeted languages. Based on
the evaluation results, we provide a detailed analysis of how each of our
extension affects QE models on the reliability and the generalization ability
to perform transfer learning under multilingual tasks. Finally, we achieve the
best performance on the ensemble model combining the models pretrained by
individual languages as well as different levels of parallel trained corpus
with a Pearson's correlation of 0.298, which is 2.54 times higher than
baselines.
Related papers
- Improving Machine Translation with Human Feedback: An Exploration of Quality Estimation as a Reward Model [75.66013048128302]
In this work, we investigate the potential of employing the QE model as the reward model to predict human preferences for feedback training.
We first identify the overoptimization problem during QE-based feedback training, manifested as an increase in reward while translation quality declines.
To address the problem, we adopt a simple yet effective method that uses rules to detect the incorrect translations and assigns a penalty term to the reward scores of them.
arXiv Detail & Related papers (2024-01-23T16:07:43Z) - Don't Rank, Combine! Combining Machine Translation Hypotheses Using Quality Estimation [0.6998085564793366]
This work introduces QE-fusion, a method that synthesizes translations using a quality estimation metric (QE)
We demonstrate that our approach generates novel translations in over half of the cases.
We empirically establish that QE-fusion scales linearly with the number of candidates in the pool.
arXiv Detail & Related papers (2024-01-12T16:52:41Z) - Unify word-level and span-level tasks: NJUNLP's Participation for the
WMT2023 Quality Estimation Shared Task [59.46906545506715]
We introduce the NJUNLP team to the WMT 2023 Quality Estimation (QE) shared task.
Our team submitted predictions for the English-German language pair on all two sub-tasks.
Our models achieved the best results in English-German for both word-level and fine-grained error span detection sub-tasks.
arXiv Detail & Related papers (2023-09-23T01:52:14Z) - PAXQA: Generating Cross-lingual Question Answering Examples at Training
Scale [53.92008514395125]
PAXQA (Projecting annotations for cross-lingual (x) QA) decomposes cross-lingual QA into two stages.
We propose a novel use of lexically-constrained machine translation, in which constrained entities are extracted from the parallel bitexts.
We show that models fine-tuned on these datasets outperform prior synthetic data generation models over several extractive QA datasets.
arXiv Detail & Related papers (2023-04-24T15:46:26Z) - QAmeleon: Multilingual QA with Only 5 Examples [71.80611036543633]
We show how to leverage pre-trained language models under a few-shot learning setting.
Our approach, QAmeleon, uses a PLM to automatically generate multilingual data upon which QA models are trained.
Prompt tuning the PLM for data synthesis with only five examples per language delivers accuracy superior to translation-based baselines.
arXiv Detail & Related papers (2022-11-15T16:14:39Z) - Alibaba-Translate China's Submission for WMT 2022 Quality Estimation
Shared Task [80.22825549235556]
We present our submission to the sentence-level MQM benchmark at Quality Estimation Shared Task, named UniTE.
Specifically, our systems employ the framework of UniTE, which combined three types of input formats during training with a pre-trained language model.
Results show that our models reach 1st overall ranking in the Multilingual and English-Russian settings, and 2nd overall ranking in English-German and Chinese-English settings.
arXiv Detail & Related papers (2022-10-18T08:55:27Z) - IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and
Languages [87.5457337866383]
We introduce the Image-Grounded Language Understanding Evaluation benchmark.
IGLUE brings together visual question answering, cross-modal retrieval, grounded reasoning, and grounded entailment tasks across 20 diverse languages.
We find that translate-test transfer is superior to zero-shot transfer and that few-shot learning is hard to harness for many tasks.
arXiv Detail & Related papers (2022-01-27T18:53:22Z) - QEMind: Alibaba's Submission to the WMT21 Quality Estimation Shared Task [24.668012925628968]
We present our submissions to the WMT 2021 QE shared task.
We propose several useful features to evaluate the uncertainty of the translations to build our QE system, named textitQEMind.
We show that our multilingual systems outperform the best system in the Direct Assessment QE task of WMT 2020.
arXiv Detail & Related papers (2021-12-30T02:27:29Z) - Ensemble Fine-tuned mBERT for Translation Quality Estimation [0.0]
In this paper, we discuss our submission to the WMT 2021 QE Shared Task.
Our proposed system is an ensemble of multilingual BERT (mBERT)-based regression models.
It demonstrates comparable performance with respect to the Pearson's correlation and beats the baseline system in MAE/ RMSE for several language pairs.
arXiv Detail & Related papers (2021-09-08T20:13:06Z) - An Exploratory Analysis of Multilingual Word-Level Quality Estimation
with Cross-Lingual Transformers [3.4355075318742165]
We show that multilingual, word-level QE models perform on par with the current language-specific models.
In the cases of zero-shot and few-shot QE, we demonstrate that it is possible to accurately predict word-level quality for any given new language pair from models trained on other language pairs.
arXiv Detail & Related papers (2021-05-31T23:21:10Z) - Verdi: Quality Estimation and Error Detection for Bilingual [23.485380293716272]
Verdi is a novel framework for word-level and sentence-level post-editing effort estimation for bilingual corpora.
We exploit the symmetric nature of bilingual corpora and apply model-level dual learning in the NMT predictor.
Our method beats the winner of the competition and outperforms other baseline methods by a great margin.
arXiv Detail & Related papers (2021-05-31T11:04:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.