Unbabel's Participation in the WMT20 Metrics Shared Task
- URL: http://arxiv.org/abs/2010.15535v1
- Date: Thu, 29 Oct 2020 12:59:44 GMT
- Title: Unbabel's Participation in the WMT20 Metrics Shared Task
- Authors: Ricardo Rei, Craig Stewart, Catarina Farinha, Alon Lavie
- Abstract summary: We present the contribution of the Unbabel team to the WMT 2020 Shared Task on Metrics.
We intend to participate on the segment-level, document-level and system-level tracks on all language pairs.
We illustrate results of our models in these tracks with reference to test sets from the previous year.
- Score: 8.621669980568822
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present the contribution of the Unbabel team to the WMT 2020 Shared Task
on Metrics. We intend to participate on the segment-level, document-level and
system-level tracks on all language pairs, as well as the 'QE as a Metric'
track. Accordingly, we illustrate results of our models in these tracks with
reference to test sets from the previous year. Our submissions build upon the
recently proposed COMET framework: We train several estimator models to regress
on different human-generated quality scores and a novel ranking model trained
on relative ranks obtained from Direct Assessments. We also propose a simple
technique for converting segment-level predictions into a document-level score.
Overall, our systems achieve strong results for all language pairs on previous
test sets and in many cases set a new state-of-the-art.
Related papers
- The Eval4NLP 2023 Shared Task on Prompting Large Language Models as
Explainable Metrics [36.52897053496835]
generative large language models (LLMs) have shown remarkable capabilities to solve tasks with minimal or no task-related examples.
We introduce the Eval4NLP 2023 shared task that asks participants to explore prompting and score extraction for machine translation (MT) and summarization evaluation.
We present an overview of participants' approaches and evaluate them on a new reference-free test set spanning three language pairs for MT and a summarization dataset.
arXiv Detail & Related papers (2023-10-30T17:55:08Z) - BLESS: Benchmarking Large Language Models on Sentence Simplification [55.461555829492866]
We present BLESS, a performance benchmark of the most recent state-of-the-art large language models (LLMs) on the task of text simplification (TS)
We assess a total of 44 models, differing in size, architecture, pre-training methods, and accessibility, on three test sets from different domains (Wikipedia, news, and medical) under a few-shot setting.
Our evaluation indicates that the best LLMs, despite not being trained on TS, perform comparably with state-of-the-art TS baselines.
arXiv Detail & Related papers (2023-10-24T12:18:17Z) - Unify word-level and span-level tasks: NJUNLP's Participation for the
WMT2023 Quality Estimation Shared Task [59.46906545506715]
We introduce the NJUNLP team to the WMT 2023 Quality Estimation (QE) shared task.
Our team submitted predictions for the English-German language pair on all two sub-tasks.
Our models achieved the best results in English-German for both word-level and fine-grained error span detection sub-tasks.
arXiv Detail & Related papers (2023-09-23T01:52:14Z) - Alibaba-Translate China's Submission for WMT 2022 Quality Estimation
Shared Task [80.22825549235556]
We present our submission to the sentence-level MQM benchmark at Quality Estimation Shared Task, named UniTE.
Specifically, our systems employ the framework of UniTE, which combined three types of input formats during training with a pre-trained language model.
Results show that our models reach 1st overall ranking in the Multilingual and English-Russian settings, and 2nd overall ranking in English-German and Chinese-English settings.
arXiv Detail & Related papers (2022-10-18T08:55:27Z) - Alibaba-Translate China's Submission for WMT 2022 Metrics Shared Task [61.34108034582074]
We build our system based on the core idea of UNITE (Unified Translation Evaluation)
During the model pre-training phase, we first apply the pseudo-labeled data examples to continuously pre-train UNITE.
During the fine-tuning phase, we use both Direct Assessment (DA) and Multidimensional Quality Metrics (MQM) data from past years' WMT competitions.
arXiv Detail & Related papers (2022-10-18T08:51:25Z) - RoBLEURT Submission for the WMT2021 Metrics Task [72.26898579202076]
We present our submission to the Shared Metrics Task: RoBLEURT.
Our model reaches state-of-the-art correlations with the WMT 2020 human annotations upon 8 out of 10 to-English language pairs.
arXiv Detail & Related papers (2022-04-28T08:49:40Z) - TransQuest at WMT2020: Sentence-Level Direct Assessment [14.403165053223395]
We introduce a simple QE framework based on cross-lingual transformers.
We use it to implement and evaluate two different neural architectures.
Our approach is the winning solution in all of the language pairs according to the WMT 2020 official results.
arXiv Detail & Related papers (2020-10-11T18:53:05Z) - Learning to Evaluate Translation Beyond English: BLEURT Submissions to
the WMT Metrics 2020 Shared Task [30.889496911261677]
This paper describes our contribution to the WMT 2020 Metrics Shared Task.
We make several submissions based on BLEURT, a metric based on transfer learning.
We show how to combine BLEURT's predictions with those of YiSi and use alternative reference translations to enhance the performance.
arXiv Detail & Related papers (2020-10-08T23:16:26Z) - Towards Making the Most of Context in Neural Machine Translation [112.9845226123306]
We argue that previous research did not make a clear use of the global context.
We propose a new document-level NMT framework that deliberately models the local context of each sentence.
arXiv Detail & Related papers (2020-02-19T03:30:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.