The FLORES-101 Evaluation Benchmark for Low-Resource and Multilingual
Machine Translation
- URL: http://arxiv.org/abs/2106.03193v1
- Date: Sun, 6 Jun 2021 17:58:12 GMT
- Title: The FLORES-101 Evaluation Benchmark for Low-Resource and Multilingual
Machine Translation
- Authors: Naman Goyal, Cynthia Gao, Vishrav Chaudhary, Peng-Jen Chen, Guillaume
Wenzek, Da Ju, Sanjana Krishnan, Marc'Aurelio Ranzato, Francisco Guzman,
Angela Fan
- Abstract summary: In this work, we introduce the FLORES-101 evaluation benchmark, consisting of 3001 sentences extracted from English Wikipedia.
The resulting dataset enables better assessment of model quality on the long tail of low-resource languages.
- Score: 21.55080388385458
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One of the biggest challenges hindering progress in low-resource and
multilingual machine translation is the lack of good evaluation benchmarks.
Current evaluation benchmarks either lack good coverage of low-resource
languages, consider only restricted domains, or are low quality because they
are constructed using semi-automatic procedures. In this work, we introduce the
FLORES-101 evaluation benchmark, consisting of 3001 sentences extracted from
English Wikipedia and covering a variety of different topics and domains. These
sentences have been translated in 101 languages by professional translators
through a carefully controlled process. The resulting dataset enables better
assessment of model quality on the long tail of low-resource languages,
including the evaluation of many-to-many multilingual translation systems, as
all translations are multilingually aligned. By publicly releasing such a
high-quality and high-coverage dataset, we hope to foster progress in the
machine translation community and beyond.
Related papers
- On the Evaluation Practices in Multilingual NLP: Can Machine Translation Offer an Alternative to Human Translations? [19.346078451375693]
We present an analysis of existing evaluation frameworks in NLP.
We propose several directions for more robust and reliable evaluation practices.
We show that simpler baselines can achieve relatively strong performance without having benefited from large-scale multilingual pretraining.
arXiv Detail & Related papers (2024-06-20T12:46:12Z) - Investigating Neural Machine Translation for Low-Resource Languages: Using Bavarian as a Case Study [1.6819960041696331]
In this paper, we revisit state-of-the-art Neural Machine Translation techniques to develop automatic translation systems between German and Bavarian.
Our experiment entails applying Back-translation and Transfer Learning to automatically generate more training data and achieve higher translation performance.
Statistical significance results with Bonferroni correction show surprisingly high baseline systems, and that Back-translation leads to significant improvement.
arXiv Detail & Related papers (2024-04-12T06:16:26Z) - Strategies for improving low resource speech to text translation relying
on pre-trained ASR models [59.90106959717875]
This paper presents techniques and findings for improving the performance of low-resource speech to text translation (ST)
We conducted experiments on both simulated and real-low resource setups, on language pairs English - Portuguese, and Tamasheq - French respectively.
arXiv Detail & Related papers (2023-05-31T21:58:07Z) - No Language Left Behind: Scaling Human-Centered Machine Translation [69.28110770760506]
We create datasets and models aimed at narrowing the performance gap between low and high-resource languages.
We propose multiple architectural and training improvements to counteract overfitting while training on thousands of tasks.
Our model achieves an improvement of 44% BLEU relative to the previous state-of-the-art.
arXiv Detail & Related papers (2022-07-11T07:33:36Z) - IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and
Languages [87.5457337866383]
We introduce the Image-Grounded Language Understanding Evaluation benchmark.
IGLUE brings together visual question answering, cross-modal retrieval, grounded reasoning, and grounded entailment tasks across 20 diverse languages.
We find that translate-test transfer is superior to zero-shot transfer and that few-shot learning is hard to harness for many tasks.
arXiv Detail & Related papers (2022-01-27T18:53:22Z) - Towards the Next 1000 Languages in Multilingual Machine Translation:
Exploring the Synergy Between Supervised and Self-Supervised Learning [48.15259834021655]
We present a pragmatic approach towards building a multilingual machine translation model that covers hundreds of languages.
We use a mixture of supervised and self-supervised objectives, depending on the data availability for different language pairs.
We demonstrate that the synergy between these two training paradigms enables the model to produce high-quality translations in the zero-resource setting.
arXiv Detail & Related papers (2022-01-09T23:36:44Z) - AfroMT: Pretraining Strategies and Reproducible Benchmarks for
Translation of 8 African Languages [94.75849612191546]
AfroMT is a standardized, clean, and reproducible machine translation benchmark for eight widely spoken African languages.
We develop a suite of analysis tools for system diagnosis taking into account the unique properties of these languages.
We demonstrate significant improvements when pretraining on 11 languages, with gains of up to 2 BLEU points over strong baselines.
arXiv Detail & Related papers (2021-09-10T07:45:21Z) - Improving Massively Multilingual Neural Machine Translation and
Zero-Shot Translation [81.7786241489002]
Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual models and deliver poor zero-shot translations.
We argue that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics.
We propose random online backtranslation to enforce the translation of unseen training language pairs.
arXiv Detail & Related papers (2020-04-24T17:21:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.