Analyzing the Evaluation of Cross-Lingual Knowledge Transfer in
Multilingual Language Models
- URL: http://arxiv.org/abs/2402.02099v1
- Date: Sat, 3 Feb 2024 09:41:52 GMT
- Title: Analyzing the Evaluation of Cross-Lingual Knowledge Transfer in
Multilingual Language Models
- Authors: Sara Rajaee and Christof Monz
- Abstract summary: We show that observed high performance of multilingual models can be largely attributed to factors not requiring the transfer of actual linguistic knowledge.
More specifically, we observe what has been transferred across languages is mostly data artifacts and biases, especially for low-resource languages.
- Score: 12.662039551306632
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in training multilingual language models on large datasets
seem to have shown promising results in knowledge transfer across languages and
achieve high performance on downstream tasks. However, we question to what
extent the current evaluation benchmarks and setups accurately measure
zero-shot cross-lingual knowledge transfer. In this work, we challenge the
assumption that high zero-shot performance on target tasks reflects high
cross-lingual ability by introducing more challenging setups involving
instances with multiple languages. Through extensive experiments and analysis,
we show that the observed high performance of multilingual models can be
largely attributed to factors not requiring the transfer of actual linguistic
knowledge, such as task- and surface-level knowledge. More specifically, we
observe what has been transferred across languages is mostly data artifacts and
biases, especially for low-resource languages. Our findings highlight the
overlooked drawbacks of existing cross-lingual test data and evaluation setups,
calling for a more nuanced understanding of the cross-lingual capabilities of
multilingual models.
Related papers
- GradSim: Gradient-Based Language Grouping for Effective Multilingual
Training [13.730907708289331]
We propose GradSim, a language grouping method based on gradient similarity.
Our experiments on three diverse multilingual benchmark datasets show that it leads to the largest performance gains.
Besides linguistic features, the topics of the datasets play an important role for language grouping.
arXiv Detail & Related papers (2023-10-23T18:13:37Z) - Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of
Multilingual Language Models [73.11488464916668]
This study investigates the dynamics of the multilingual pretraining process.
We probe checkpoints taken from throughout XLM-R pretraining, using a suite of linguistic tasks.
Our analysis shows that the model achieves high in-language performance early on, with lower-level linguistic skills acquired before more complex ones.
arXiv Detail & Related papers (2022-05-24T03:35:00Z) - Cross-lingual Lifelong Learning [53.06904052325966]
We present a principled Cross-lingual Continual Learning (CCL) evaluation paradigm.
We provide insights into what makes multilingual sequential learning particularly challenging.
The implications of this analysis include a recipe for how to measure and balance different cross-lingual continual learning desiderata.
arXiv Detail & Related papers (2022-05-23T09:25:43Z) - Towards Best Practices for Training Multilingual Dense Retrieval Models [54.91016739123398]
We focus on the task of monolingual retrieval in a variety of typologically diverse languages using one such design.
Our study is organized as a "best practices" guide for training multilingual dense retrieval models.
arXiv Detail & Related papers (2022-04-05T17:12:53Z) - IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and
Languages [87.5457337866383]
We introduce the Image-Grounded Language Understanding Evaluation benchmark.
IGLUE brings together visual question answering, cross-modal retrieval, grounded reasoning, and grounded entailment tasks across 20 diverse languages.
We find that translate-test transfer is superior to zero-shot transfer and that few-shot learning is hard to harness for many tasks.
arXiv Detail & Related papers (2022-01-27T18:53:22Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z) - First Align, then Predict: Understanding the Cross-Lingual Ability of
Multilingual BERT [2.2931318723689276]
Cross-lingual transfer emerges from fine-tuning on a task of interest in one language and evaluating on a distinct language, not seen during the fine-tuning.
We show that multilingual BERT can be viewed as the stacking of two sub-networks: a multilingual encoder followed by a task-specific language-agnostic predictor.
While the encoder is crucial for cross-lingual transfer and remains mostly unchanged during fine-tuning, the task predictor has little importance on the transfer and can be red during fine-tuning.
arXiv Detail & Related papers (2021-01-26T22:12:38Z) - XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating
Cross-lingual Generalization [128.37244072182506]
Cross-lingual TRansfer Evaluation of Multilinguals XTREME is a benchmark for evaluating the cross-lingual generalization capabilities of multilingual representations across 40 languages and 9 tasks.
We demonstrate that while models tested on English reach human performance on many tasks, there is still a sizable gap in the performance of cross-lingually transferred models.
arXiv Detail & Related papers (2020-03-24T19:09:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.