Related papers: Unify word-level and span-level tasks: NJUNLP's Participation for the WMT2023 Quality Estimation Shared Task

Unify word-level and span-level tasks: NJUNLP's Participation for the WMT2023 Quality Estimation Shared Task

URL: http://arxiv.org/abs/2309.13230v4
Date: Mon, 11 Dec 2023 10:53:48 GMT
Title: Unify word-level and span-level tasks: NJUNLP's Participation for the WMT2023 Quality Estimation Shared Task
Authors: Xiang Geng, Zhejian Lai, Yu Zhang, Shimin Tao, Hao Yang, Jiajun Chen, Shujian Huang
Abstract summary: We introduce the NJUNLP team to the WMT 2023 Quality Estimation (QE) shared task. Our team submitted predictions for the English-German language pair on all two sub-tasks. Our models achieved the best results in English-German for both word-level and fine-grained error span detection sub-tasks.
Score: 59.46906545506715
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce the submissions of the NJUNLP team to the WMT 2023 Quality Estimation (QE) shared task. Our team submitted predictions for the English-German language pair on all two sub-tasks: (i) sentence- and word-level quality prediction; and (ii) fine-grained error span detection. This year, we further explore pseudo data methods for QE based on NJUQE framework (https://github.com/NJUNLP/njuqe). We generate pseudo MQM data using parallel data from the WMT translation task. We pre-train the XLMR large model on pseudo QE data, then fine-tune it on real QE data. At both stages, we jointly learn sentence-level scores and word-level tags. Empirically, we conduct experiments to find the key hyper-parameters that improve the performance. Technically, we propose a simple method that covert the word-level outputs to fine-grained error span results. Overall, our models achieved the best results in English-German for both word-level and fine-grained error span detection sub-tasks by a considerable margin.

Related papers

Alleviating Distribution Shift in Synthetic Data for Machine Translation Quality Estimation [55.73341401764367]
We introduce DCSQE, a novel framework for alleviating distribution shift in synthetic QE data.<n> DCSQE uses references, i.e., translation supervision signals, to guide both the generation and annotation processes.<n>Experiments demonstrate that DCSQE outperforms SOTA baselines in both supervised and unsupervised settings.
arXiv Detail & Related papers (2025-02-27T10:11:53Z)
NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts [57.53692236201343]
We propose a Multi-Task Correction MoE, where we train the experts to become an expert'' of speech-to-text, language-to-text and vision-to-text datasets. NeKo performs competitively on grammar and post-OCR correction as a multi-task model.
arXiv Detail & Related papers (2024-11-08T20:11:24Z)
PAXQA: Generating Cross-lingual Question Answering Examples at Training Scale [53.92008514395125]
PAXQA (Projecting annotations for cross-lingual (x) QA) decomposes cross-lingual QA into two stages. We propose a novel use of lexically-constrained machine translation, in which constrained entities are extracted from the parallel bitexts. We show that models fine-tuned on these datasets outperform prior synthetic data generation models over several extractive QA datasets.
arXiv Detail & Related papers (2023-04-24T15:46:26Z)
Ensemble Transfer Learning for Multilingual Coreference Resolution [60.409789753164944]
A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data. We design a simple but effective ensemble-based framework that combines various transfer learning techniques. We also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts.
arXiv Detail & Related papers (2023-01-22T18:22:55Z)
Alibaba-Translate China's Submission for WMT 2022 Quality Estimation Shared Task [80.22825549235556]
We present our submission to the sentence-level MQM benchmark at Quality Estimation Shared Task, named UniTE. Specifically, our systems employ the framework of UniTE, which combined three types of input formats during training with a pre-trained language model. Results show that our models reach 1st overall ranking in the Multilingual and English-Russian settings, and 2nd overall ranking in English-German and Chinese-English settings.
arXiv Detail & Related papers (2022-10-18T08:55:27Z)
Rethink about the Word-level Quality Estimation for Machine Translation from Human Judgement [57.72846454929923]
We create a benchmark dataset, emphHJQE, where the expert translators directly annotate poorly translated words. We propose two tag correcting strategies, namely tag refinement strategy and tree-based annotation strategy, to make the TER-based artificial QE corpus closer to emphHJQE. The results show our proposed dataset is more consistent with human judgement and also confirm the effectiveness of the proposed tag correcting strategies.
arXiv Detail & Related papers (2022-09-13T02:37:12Z)
Mismatching-Aware Unsupervised Translation Quality Estimation For Low-Resource Languages [6.049660810617423]
XLMRScore is a cross-lingual counterpart of BERTScore computed via the XLM-RoBERTa (XLMR) model. We evaluate the proposed method on four low-resource language pairs of the WMT21 QE shared task.
arXiv Detail & Related papers (2022-07-31T16:23:23Z)
QEMind: Alibaba's Submission to the WMT21 Quality Estimation Shared Task [24.668012925628968]
We present our submissions to the WMT 2021 QE shared task. We propose several useful features to evaluate the uncertainty of the translations to build our QE system, named textitQEMind. We show that our multilingual systems outperform the best system in the Direct Assessment QE task of WMT 2020.
arXiv Detail & Related papers (2021-12-30T02:27:29Z)
Ensemble Fine-tuned mBERT for Translation Quality Estimation [0.0]
In this paper, we discuss our submission to the WMT 2021 QE Shared Task. Our proposed system is an ensemble of multilingual BERT (mBERT)-based regression models. It demonstrates comparable performance with respect to the Pearson's correlation and beats the baseline system in MAE/ RMSE for several language pairs.
arXiv Detail & Related papers (2021-09-08T20:13:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.