XL-WiC: A Multilingual Benchmark for Evaluating Semantic
Contextualization
- URL: http://arxiv.org/abs/2010.06478v1
- Date: Tue, 13 Oct 2020 15:32:00 GMT
- Title: XL-WiC: A Multilingual Benchmark for Evaluating Semantic
Contextualization
- Authors: Alessandro Raganato, Tommaso Pasini, Jose Camacho-Collados, Mohammad
Taher Pilehvar
- Abstract summary: We present the Word-in-Context dataset (WiC) for assessing the ability to correctly model distinct meanings of a word.
We put forward a large multilingual benchmark, XL-WiC, featuring gold standards in 12 new languages.
Experimental results show that even when no tagged instances are available for a target language, models trained solely on the English data can attain competitive performance.
- Score: 98.61159823343036
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The ability to correctly model distinct meanings of a word is crucial for the
effectiveness of semantic representation techniques. However, most existing
evaluation benchmarks for assessing this criterion are tied to sense
inventories (usually WordNet), restricting their usage to a small subset of
knowledge-based representation techniques. The Word-in-Context dataset (WiC)
addresses the dependence on sense inventories by reformulating the standard
disambiguation task as a binary classification problem; but, it is limited to
the English language. We put forward a large multilingual benchmark, XL-WiC,
featuring gold standards in 12 new languages from varied language families and
with different degrees of resource availability, opening room for evaluation
scenarios such as zero-shot cross-lingual transfer. We perform a series of
experiments to determine the reliability of the datasets and to set performance
baselines for several recent contextualized multilingual models. Experimental
results show that even when no tagged instances are available for a target
language, models trained solely on the English data can attain competitive
performance in the task of distinguishing different meanings of a word, even
for distant languages. XL-WiC is available at
https://pilehvar.github.io/xlwic/.
Related papers
- A Measure for Transparent Comparison of Linguistic Diversity in Multilingual NLP Data Sets [1.1647644386277962]
Typologically diverse benchmarks are increasingly created to track the progress achieved in multilingual NLP.
We propose assessing linguistic diversity of a data set against a reference language sample.
arXiv Detail & Related papers (2024-03-06T18:14:22Z) - X-SNS: Cross-Lingual Transfer Prediction through Sub-Network Similarity [19.15213046428148]
Cross-lingual transfer (XLT) is an ability of multilingual language models that preserves their performance on a task to a significant extent when evaluated in languages that were not included in the fine-tuning process.
We propose the utilization of sub-network similarity between two languages as a proxy for predicting the compatibility of the languages in the context of XLT.
arXiv Detail & Related papers (2023-10-26T05:39:49Z) - IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and
Languages [87.5457337866383]
We introduce the Image-Grounded Language Understanding Evaluation benchmark.
IGLUE brings together visual question answering, cross-modal retrieval, grounded reasoning, and grounded entailment tasks across 20 diverse languages.
We find that translate-test transfer is superior to zero-shot transfer and that few-shot learning is hard to harness for many tasks.
arXiv Detail & Related papers (2022-01-27T18:53:22Z) - On Cross-Lingual Retrieval with Multilingual Text Encoders [51.60862829942932]
We study the suitability of state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks.
We benchmark their performance in unsupervised ad-hoc sentence- and document-level CLIR experiments.
We evaluate multilingual encoders fine-tuned in a supervised fashion (i.e., we learn to rank) on English relevance data in a series of zero-shot language and domain transfer CLIR experiments.
arXiv Detail & Related papers (2021-12-21T08:10:27Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z) - XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning [68.57658225995966]
Cross-lingual Choice of Plausible Alternatives (XCOPA) is a typologically diverse multilingual dataset for causal commonsense reasoning in 11 languages.
We evaluate a range of state-of-the-art models on this novel dataset, revealing that the performance of current methods falls short compared to translation-based transfer.
arXiv Detail & Related papers (2020-05-01T12:22:33Z) - XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating
Cross-lingual Generalization [128.37244072182506]
Cross-lingual TRansfer Evaluation of Multilinguals XTREME is a benchmark for evaluating the cross-lingual generalization capabilities of multilingual representations across 40 languages and 9 tasks.
We demonstrate that while models tested on English reach human performance on many tasks, there is still a sizable gap in the performance of cross-lingually transferred models.
arXiv Detail & Related papers (2020-03-24T19:09:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.