SurreyAI 2023 Submission for the Quality Estimation Shared Task
- URL: http://arxiv.org/abs/2312.00525v1
- Date: Fri, 1 Dec 2023 12:01:04 GMT
- Title: SurreyAI 2023 Submission for the Quality Estimation Shared Task
- Authors: Archchana Sindhujan, Diptesh Kanojia, Constantin Orasan, Tharindu
Ranasinghe
- Abstract summary: This paper describes the approach adopted by the SurreyAI team for addressing the Sentence-Level Direct Assessment task in WMT23.
The proposed approach builds upon the TransQuest framework, exploring various autoencoder pre-trained language models.
The evaluation utilizes Spearman and Pearson correlation coefficients, assessing the relationship between machine-predicted quality scores and human judgments.
- Score: 17.122657128702276
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Quality Estimation (QE) systems are important in situations where it is
necessary to assess the quality of translations, but there is no reference
available. This paper describes the approach adopted by the SurreyAI team for
addressing the Sentence-Level Direct Assessment shared task in WMT23. The
proposed approach builds upon the TransQuest framework, exploring various
autoencoder pre-trained language models within the MonoTransQuest architecture
using single and ensemble settings. The autoencoder pre-trained language models
employed in the proposed systems are XLMV, InfoXLM-large, and XLMR-large. The
evaluation utilizes Spearman and Pearson correlation coefficients, assessing
the relationship between machine-predicted quality scores and human judgments
for 5 language pairs (English-Gujarati, English-Hindi, English-Marathi,
English-Tamil and English-Telugu). The MonoTQ-InfoXLM-large approach emerges as
a robust strategy, surpassing all other individual models proposed in this
study by significantly improving over the baseline for the majority of the
language pairs.
Related papers
- P-MMEval: A Parallel Multilingual Multitask Benchmark for Consistent Evaluation of LLMs [84.24644520272835]
Large language models (LLMs) showcase varied multilingual capabilities across tasks like translation, code generation, and reasoning.
Previous assessments often limited their scope to fundamental natural language processing (NLP) or isolated capability-specific tasks.
We present a pipeline for selecting available and reasonable benchmarks from massive ones, addressing the oversight in previous work regarding the utility of these benchmarks.
We introduce P-MMEval, a large-scale benchmark covering effective fundamental and capability-specialized datasets.
arXiv Detail & Related papers (2024-11-14T01:29:36Z) - CANTONMT: Investigating Back-Translation and Model-Switch Mechanisms for Cantonese-English Neural Machine Translation [9.244878233604819]
This paper investigates the development and evaluation of machine translation models from Cantonese to English.
A new parallel corpus has been created by combining different available corpora online with preprocessing and cleaning.
A monolingual Cantonese dataset has been created through web scraping to aid the synthetic parallel corpus generation.
arXiv Detail & Related papers (2024-05-13T20:37:04Z) - The Power of Question Translation Training in Multilingual Reasoning: Broadened Scope and Deepened Insights [108.40766216456413]
We propose a question alignment framework to bridge the gap between large language models' English and non-English performance.
Experiment results show it can boost multilingual performance across diverse reasoning scenarios, model families, and sizes.
We analyze representation space, generated response and data scales, and reveal how question translation training strengthens language alignment within LLMs.
arXiv Detail & Related papers (2024-05-02T14:49:50Z) - Rethinking Word-Level Auto-Completion in Computer-Aided Translation [76.34184928621477]
Word-Level Auto-Completion (WLAC) plays a crucial role in Computer-Assisted Translation.
It aims at providing word-level auto-completion suggestions for human translators.
We introduce a measurable criterion to answer this question and discover that existing WLAC models often fail to meet this criterion.
We propose an effective approach to enhance WLAC performance by promoting adherence to the criterion.
arXiv Detail & Related papers (2023-10-23T03:11:46Z) - Strategies for improving low resource speech to text translation relying
on pre-trained ASR models [59.90106959717875]
This paper presents techniques and findings for improving the performance of low-resource speech to text translation (ST)
We conducted experiments on both simulated and real-low resource setups, on language pairs English - Portuguese, and Tamasheq - French respectively.
arXiv Detail & Related papers (2023-05-31T21:58:07Z) - Alibaba-Translate China's Submission for WMT 2022 Quality Estimation
Shared Task [80.22825549235556]
We present our submission to the sentence-level MQM benchmark at Quality Estimation Shared Task, named UniTE.
Specifically, our systems employ the framework of UniTE, which combined three types of input formats during training with a pre-trained language model.
Results show that our models reach 1st overall ranking in the Multilingual and English-Russian settings, and 2nd overall ranking in English-German and Chinese-English settings.
arXiv Detail & Related papers (2022-10-18T08:55:27Z) - Ensemble Fine-tuned mBERT for Translation Quality Estimation [0.0]
In this paper, we discuss our submission to the WMT 2021 QE Shared Task.
Our proposed system is an ensemble of multilingual BERT (mBERT)-based regression models.
It demonstrates comparable performance with respect to the Pearson's correlation and beats the baseline system in MAE/ RMSE for several language pairs.
arXiv Detail & Related papers (2021-09-08T20:13:06Z) - An Exploratory Analysis of Multilingual Word-Level Quality Estimation
with Cross-Lingual Transformers [3.4355075318742165]
We show that multilingual, word-level QE models perform on par with the current language-specific models.
In the cases of zero-shot and few-shot QE, we demonstrate that it is possible to accurately predict word-level quality for any given new language pair from models trained on other language pairs.
arXiv Detail & Related papers (2021-05-31T23:21:10Z) - Ensemble-based Transfer Learning for Low-resource Machine Translation
Quality Estimation [1.7188280334580195]
We focus on the Sentence-Level QE Shared Task of the Fifth Conference on Machine Translation (WMT20)
We propose an ensemble-based predictor-estimator QE model with transfer learning to overcome such QE data scarcity challenge.
We achieve the best performance on the ensemble model combining the models pretrained by individual languages as well as different levels of parallel trained corpus with a Pearson's correlation of 0.298.
arXiv Detail & Related papers (2021-05-17T06:02:17Z) - COMET: A Neural Framework for MT Evaluation [8.736370689844682]
We present COMET, a neural framework for training multilingual machine translation evaluation models.
Our framework exploits information from both the source input and a target-language reference translation in order to more accurately predict MT quality.
Our models achieve new state-of-the-art performance on the WMT 2019 Metrics shared task and demonstrate robustness to high-performing systems.
arXiv Detail & Related papers (2020-09-18T18:54:15Z) - On the Limitations of Cross-lingual Encoders as Exposed by
Reference-Free Machine Translation Evaluation [55.02832094101173]
Evaluation of cross-lingual encoders is usually performed either via zero-shot cross-lingual transfer in supervised downstream tasks or via unsupervised cross-lingual similarity.
This paper concerns ourselves with reference-free machine translation (MT) evaluation where we directly compare source texts to (sometimes low-quality) system translations.
We systematically investigate a range of metrics based on state-of-the-art cross-lingual semantic representations obtained with pretrained M-BERT and LASER.
We find that they perform poorly as semantic encoders for reference-free MT evaluation and identify their two key limitations.
arXiv Detail & Related papers (2020-05-03T22:10:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.