FRMT: A Benchmark for Few-Shot Region-Aware Machine Translation
- URL: http://arxiv.org/abs/2210.00193v3
- Date: Tue, 3 Oct 2023 17:20:04 GMT
- Title: FRMT: A Benchmark for Few-Shot Region-Aware Machine Translation
- Authors: Parker Riley, Timothy Dozat, Jan A. Botha, Xavier Garcia, Dan
Garrette, Jason Riesa, Orhan Firat, Noah Constant
- Abstract summary: We present FRMT, a new dataset and evaluation benchmark for Few-shot Region-aware Machine Translation.
The dataset consists of professional translations from English into two regional variants each of Portuguese and Mandarin Chinese.
- Score: 64.9546787488337
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present FRMT, a new dataset and evaluation benchmark for Few-shot
Region-aware Machine Translation, a type of style-targeted translation. The
dataset consists of professional translations from English into two regional
variants each of Portuguese and Mandarin Chinese. Source documents are selected
to enable detailed analysis of phenomena of interest, including lexically
distinct terms and distractor terms. We explore automatic evaluation metrics
for FRMT and validate their correlation with expert human evaluation across
both region-matched and mismatched rating scenarios. Finally, we present a
number of baseline models for this task, and offer guidelines for how
researchers can train, evaluate, and compare their own models. Our dataset and
evaluation code are publicly available: https://bit.ly/frmt-task
Related papers
- P-MMEval: A Parallel Multilingual Multitask Benchmark for Consistent Evaluation of LLMs [84.24644520272835]
Large language models (LLMs) showcase varied multilingual capabilities across tasks like translation, code generation, and reasoning.
Previous assessments often limited their scope to fundamental natural language processing (NLP) or isolated capability-specific tasks.
We present a pipeline for selecting available and reasonable benchmarks from massive ones, addressing the oversight in previous work regarding the utility of these benchmarks.
We introduce P-MMEval, a large-scale benchmark covering effective fundamental and capability-specialized datasets.
arXiv Detail & Related papers (2024-11-14T01:29:36Z) - Towards Zero-Shot Multimodal Machine Translation [64.9141931372384]
We propose a method to bypass the need for fully supervised data to train multimodal machine translation systems.
Our method, called ZeroMMT, consists in adapting a strong text-only machine translation (MT) model by training it on a mixture of two objectives.
To prove that our method generalizes to languages with no fully supervised training data available, we extend the CoMMuTE evaluation dataset to three new languages: Arabic, Russian and Chinese.
arXiv Detail & Related papers (2024-07-18T15:20:31Z) - BiVert: Bidirectional Vocabulary Evaluation using Relations for Machine
Translation [4.651581292181871]
We propose a bidirectional semantic-based evaluation method designed to assess the sense distance of the translation from the source text.
This approach employs the comprehensive multilingual encyclopedic dictionary BabelNet.
Factual analysis shows a strong correlation between the average evaluation scores generated by our method and the human assessments across various machine translation systems for English-German language pair.
arXiv Detail & Related papers (2024-03-06T08:02:21Z) - Statistical Machine Translation for Indic Languages [1.8899300124593648]
This paper canvasses about the development of bilingual Statistical Machine Translation models.
To create the system, MOSES open-source SMT toolkit is explored.
In our experiment, the quality of the translation is evaluated using standard metrics such as BLEU, METEOR, and RIBES.
arXiv Detail & Related papers (2023-01-02T06:23:12Z) - A Closer Look at Debiased Temporal Sentence Grounding in Videos:
Dataset, Metric, and Approach [53.727460222955266]
Temporal Sentence Grounding in Videos (TSGV) aims to ground a natural language sentence in an untrimmed video.
Recent studies have found that current benchmark datasets may have obvious moment annotation biases.
We introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets.
arXiv Detail & Related papers (2022-03-10T08:58:18Z) - ChrEnTranslate: Cherokee-English Machine Translation Demo with Quality
Estimation and Corrective Feedback [70.5469946314539]
ChrEnTranslate is an online machine translation demonstration system for translation between English and an endangered language Cherokee.
It supports both statistical and neural translation models as well as provides quality estimation to inform users of reliability.
arXiv Detail & Related papers (2021-07-30T17:58:54Z) - Backretrieval: An Image-Pivoted Evaluation Metric for Cross-Lingual Text
Representations Without Parallel Corpora [19.02834713111249]
Backretrieval is shown to correlate with ground truth metrics on annotated datasets.
Our experiments conclude with a case study on a recipe dataset without parallel cross-lingual data.
arXiv Detail & Related papers (2021-05-11T12:14:24Z) - On the Limitations of Cross-lingual Encoders as Exposed by
Reference-Free Machine Translation Evaluation [55.02832094101173]
Evaluation of cross-lingual encoders is usually performed either via zero-shot cross-lingual transfer in supervised downstream tasks or via unsupervised cross-lingual similarity.
This paper concerns ourselves with reference-free machine translation (MT) evaluation where we directly compare source texts to (sometimes low-quality) system translations.
We systematically investigate a range of metrics based on state-of-the-art cross-lingual semantic representations obtained with pretrained M-BERT and LASER.
We find that they perform poorly as semantic encoders for reference-free MT evaluation and identify their two key limitations.
arXiv Detail & Related papers (2020-05-03T22:10:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.