Mr. TyDi: A Multi-lingual Benchmark for Dense Retrieval
- URL: http://arxiv.org/abs/2108.08787v1
- Date: Thu, 19 Aug 2021 16:53:43 GMT
- Title: Mr. TyDi: A Multi-lingual Benchmark for Dense Retrieval
- Authors: Xinyu Zhang, Xueguang Ma, Peng Shi, and Jimmy Lin
- Abstract summary: Mr. TyDi is a benchmark dataset for mono-lingual retrieval in eleven typologically diverse languages.
The goal of this resource is to spur research in dense retrieval techniques in non-English languages.
- Score: 51.004601358498135
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present Mr. TyDi, a multi-lingual benchmark dataset for mono-lingual
retrieval in eleven typologically diverse languages, designed to evaluate
ranking with learned dense representations. The goal of this resource is to
spur research in dense retrieval techniques in non-English languages, motivated
by recent observations that existing techniques for representation learning
perform poorly when applied to out-of-distribution data. As a starting point,
we provide zero-shot baselines for this new dataset based on a multi-lingual
adaptation of DPR that we call "mDPR". Experiments show that although the
effectiveness of mDPR is much lower than BM25, dense representations
nevertheless appear to provide valuable relevance signals, improving BM25
results in sparse-dense hybrids. In addition to analyses of our results, we
also discuss future challenges and present a research agenda in multi-lingual
dense retrieval. Mr. TyDi can be downloaded at
https://github.com/castorini/mr.tydi.
Related papers
- P-MMEval: A Parallel Multilingual Multitask Benchmark for Consistent Evaluation of LLMs [84.24644520272835]
Large language models (LLMs) showcase varied multilingual capabilities across tasks like translation, code generation, and reasoning.
Previous assessments often limited their scope to fundamental natural language processing (NLP) or isolated capability-specific tasks.
We present a pipeline for selecting available and reasonable benchmarks from massive ones, addressing the oversight in previous work regarding the utility of these benchmarks.
We introduce P-MMEval, a large-scale benchmark covering effective fundamental and capability-specialized datasets.
arXiv Detail & Related papers (2024-11-14T01:29:36Z) - What are the limits of cross-lingual dense passage retrieval for low-resource languages? [23.88853455670863]
We analyze the capabilities of the multi-lingual Passage Retriever (mDPR) for extremely low-resource languages.
mDPR achieves success on multilingual open QA benchmarks across 26 languages, of which 9 were unseen during training.
We focus on two extremely low-resource languages for which mDPR performs poorly: Amharic and Khmer.
arXiv Detail & Related papers (2024-08-21T18:51:46Z) - Cross-lingual Contextualized Phrase Retrieval [63.80154430930898]
We propose a new task formulation of dense retrieval, cross-lingual contextualized phrase retrieval.
We train our Cross-lingual Contextualized Phrase Retriever (CCPR) using contrastive learning.
On the phrase retrieval task, CCPR surpasses baselines by a significant margin, achieving a top-1 accuracy that is at least 13 points higher.
arXiv Detail & Related papers (2024-03-25T14:46:51Z) - Unsupervised Multilingual Dense Retrieval via Generative Pseudo Labeling [32.10366004426449]
This paper introduces UMR, an Unsupervised dense Multilingual Retriever trained without any paired data.
We propose a two-stage framework which iteratively improves the performance of multilingual dense retrievers.
arXiv Detail & Related papers (2024-03-06T07:49:06Z) - Optimal Transport Posterior Alignment for Cross-lingual Semantic Parsing [68.47787275021567]
Cross-lingual semantic parsing transfers parsing capability from a high-resource language (e.g., English) to low-resource languages with scarce training data.
We propose a new approach to cross-lingual semantic parsing by explicitly minimizing cross-lingual divergence between latent variables using Optimal Transport.
arXiv Detail & Related papers (2023-07-09T04:52:31Z) - A Closer Look at Debiased Temporal Sentence Grounding in Videos:
Dataset, Metric, and Approach [53.727460222955266]
Temporal Sentence Grounding in Videos (TSGV) aims to ground a natural language sentence in an untrimmed video.
Recent studies have found that current benchmark datasets may have obvious moment annotation biases.
We introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets.
arXiv Detail & Related papers (2022-03-10T08:58:18Z) - Matching Tweets With Applicable Fact-Checks Across Languages [27.762055254009017]
We focus on automatically finding existing fact-checks for claims made in social media posts (tweets)
We conduct both classification and retrieval experiments, in monolingual (English only), multilingual (Spanish, Portuguese), and cross-lingual (Hindi-English) settings.
We present promising results for "match" classification (93% average accuracy) in four language pairs.
arXiv Detail & Related papers (2022-02-14T23:33:02Z) - IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and
Languages [87.5457337866383]
We introduce the Image-Grounded Language Understanding Evaluation benchmark.
IGLUE brings together visual question answering, cross-modal retrieval, grounded reasoning, and grounded entailment tasks across 20 diverse languages.
We find that translate-test transfer is superior to zero-shot transfer and that few-shot learning is hard to harness for many tasks.
arXiv Detail & Related papers (2022-01-27T18:53:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.