XTREME-R: Towards More Challenging and Nuanced Multilingual Evaluation
- URL: http://arxiv.org/abs/2104.07412v1
- Date: Thu, 15 Apr 2021 12:26:12 GMT
- Title: XTREME-R: Towards More Challenging and Nuanced Multilingual Evaluation
- Authors: Sebastian Ruder, Noah Constant, Jan Botha, Aditya Siddhant, Orhan
Firat, Jinlan Fu, Pengfei Liu, Junjie Hu, Graham Neubig, Melvin Johnson
- Abstract summary: This paper analyzes the current state of cross-lingual transfer learning.
We extend XTREME to XTREME-R, which consists of an improved set of ten natural language understanding tasks.
- Score: 93.80733419450225
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning has brought striking advances in multilingual natural
language processing capabilities over the past year. For example, the latest
techniques have improved the state-of-the-art performance on the XTREME
multilingual benchmark by more than 13 points. While a sizeable gap to
human-level performance remains, improvements have been easier to achieve in
some tasks than in others. This paper analyzes the current state of
cross-lingual transfer learning and summarizes some lessons learned. In order
to catalyze meaningful progress, we extend XTREME to XTREME-R, which consists
of an improved set of ten natural language understanding tasks, including
challenging language-agnostic retrieval tasks, and covers 50 typologically
diverse languages. In addition, we provide a massively multilingual diagnostic
suite and fine-grained multi-dataset evaluation capabilities through an
interactive public leaderboard to gain a better understanding of such models.
Related papers
- BUFFET: Benchmarking Large Language Models for Few-shot Cross-lingual
Transfer [81.5984433881309]
We introduce BUFFET, which unifies 15 diverse tasks across 54 languages in a sequence-to-sequence format.
BUFFET is designed to establish a rigorous and equitable evaluation framework for few-shot cross-lingual transfer.
Our findings reveal significant room for improvement in few-shot in-context cross-lingual transfer.
arXiv Detail & Related papers (2023-05-24T08:06:33Z) - Not All Languages Are Created Equal in LLMs: Improving Multilingual
Capability by Cross-Lingual-Thought Prompting [123.16452714740106]
Large language models (LLMs) demonstrate impressive multilingual capability, but their performance varies substantially across different languages.
We introduce a simple yet effective method, called cross-lingual-thought prompting (XLT)
XLT is a generic template prompt that stimulates cross-lingual and logical reasoning skills to enhance task performance across languages.
arXiv Detail & Related papers (2023-05-11T17:44:17Z) - XNLI 2.0: Improving XNLI dataset and performance on Cross Lingual
Understanding (XLU) [0.0]
We focus on improving the original XNLI dataset by re-translating the MNLI dataset in all of the 14 different languages present in XNLI.
We also perform experiments by training models in all 15 languages and analyzing their performance on the task of natural language inference.
arXiv Detail & Related papers (2023-01-16T17:24:57Z) - The Impact of Cross-Lingual Adjustment of Contextual Word
Representations on Zero-Shot Transfer [3.300216758849348]
Large multilingual language models such as mBERT or XLM-R enable zero-shot cross-lingual transfer in various IR and NLP tasks.
We propose a data- and compute-efficient method for cross-lingual adjustment of mBERT that uses a small parallel corpus to make embeddings of related words across languages similar to each other.
We experiment with a typologically diverse set of languages (Spanish, Russian, Vietnamese, and Hindi) and extend their original implementations to new tasks.
Our study reproduced gains in NLI for four languages, showed improved NER, XSR, and cross-lingual QA
arXiv Detail & Related papers (2022-04-13T15:28:43Z) - IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and
Languages [87.5457337866383]
We introduce the Image-Grounded Language Understanding Evaluation benchmark.
IGLUE brings together visual question answering, cross-modal retrieval, grounded reasoning, and grounded entailment tasks across 20 diverse languages.
We find that translate-test transfer is superior to zero-shot transfer and that few-shot learning is hard to harness for many tasks.
arXiv Detail & Related papers (2022-01-27T18:53:22Z) - Syntax-augmented Multilingual BERT for Cross-lingual Transfer [37.99210035238424]
This work shows that explicitly providing language syntax and training mBERT helps cross-lingual transfer.
Experiment results show that syntax-augmented mBERT improves cross-lingual transfer on popular benchmarks.
arXiv Detail & Related papers (2021-06-03T21:12:50Z) - X-METRA-ADA: Cross-lingual Meta-Transfer Learning Adaptation to Natural
Language Understanding and Question Answering [55.57776147848929]
We propose X-METRA-ADA, a cross-lingual MEta-TRAnsfer learning ADAptation approach for Natural Language Understanding (NLU)
Our approach adapts MAML, an optimization-based meta-learning approach, to learn to adapt to new languages.
We show that our approach outperforms naive fine-tuning, reaching competitive performance on both tasks for most languages.
arXiv Detail & Related papers (2021-04-20T00:13:35Z) - XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating
Cross-lingual Generalization [128.37244072182506]
Cross-lingual TRansfer Evaluation of Multilinguals XTREME is a benchmark for evaluating the cross-lingual generalization capabilities of multilingual representations across 40 languages and 9 tasks.
We demonstrate that while models tested on English reach human performance on many tasks, there is still a sizable gap in the performance of cross-lingually transferred models.
arXiv Detail & Related papers (2020-03-24T19:09:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.