On Cross-Lingual Retrieval with Multilingual Text Encoders
- URL: http://arxiv.org/abs/2112.11031v1
- Date: Tue, 21 Dec 2021 08:10:27 GMT
- Title: On Cross-Lingual Retrieval with Multilingual Text Encoders
- Authors: Robert Litschko, Ivan Vuli\'c, Simone Paolo Ponzetto, Goran Glava\v{s}
- Abstract summary: We study the suitability of state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks.
We benchmark their performance in unsupervised ad-hoc sentence- and document-level CLIR experiments.
We evaluate multilingual encoders fine-tuned in a supervised fashion (i.e., we learn to rank) on English relevance data in a series of zero-shot language and domain transfer CLIR experiments.
- Score: 51.60862829942932
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In this work we present a systematic empirical study focused on the
suitability of the state-of-the-art multilingual encoders for cross-lingual
document and sentence retrieval tasks across a number of diverse language
pairs. We first treat these models as multilingual text encoders and benchmark
their performance in unsupervised ad-hoc sentence- and document-level CLIR. In
contrast to supervised language understanding, our results indicate that for
unsupervised document-level CLIR -- a setup with no relevance judgments for
IR-specific fine-tuning -- pretrained multilingual encoders on average fail to
significantly outperform earlier models based on CLWEs. For sentence-level
retrieval, we do obtain state-of-the-art performance: the peak scores, however,
are met by multilingual encoders that have been further specialized, in a
supervised fashion, for sentence understanding tasks, rather than using their
vanilla 'off-the-shelf' variants. Following these results, we introduce
localized relevance matching for document-level CLIR, where we independently
score a query against document sections. In the second part, we evaluate
multilingual encoders fine-tuned in a supervised fashion (i.e., we learn to
rank) on English relevance data in a series of zero-shot language and domain
transfer CLIR experiments. Our results show that supervised re-ranking rarely
improves the performance of multilingual transformers as unsupervised base
rankers. Finally, only with in-domain contrastive fine-tuning (i.e., same
domain, only language transfer), we manage to improve the ranking quality. We
uncover substantial empirical differences between cross-lingual retrieval
results and results of (zero-shot) cross-lingual transfer for monolingual
retrieval in target languages, which point to "monolingual overfitting" of
retrieval models trained on monolingual data.
Related papers
- Do We Need Language-Specific Fact-Checking Models? The Case of Chinese [15.619421104102516]
This paper investigates the potential benefits of language-specific fact-checking models, focusing on the case of Chinese.
We first demonstrate the limitations of translation-based methods and multilingual large language models, highlighting the need for language-specific systems.
We propose a Chinese fact-checking system that can better retrieve evidence from a document by incorporating context information.
arXiv Detail & Related papers (2024-01-27T20:26:03Z) - Soft Prompt Decoding for Multilingual Dense Retrieval [30.766917713997355]
We show that applying state-of-the-art approaches developed for cross-lingual information retrieval to MLIR tasks leads to sub-optimal performance.
This is due to the heterogeneous and imbalanced nature of multilingual collections.
We present KD-SPD, a novel soft prompt decoding approach for MLIR that implicitly "translates" the representation of documents in different languages into the same embedding space.
arXiv Detail & Related papers (2023-05-15T21:17:17Z) - Modeling Sequential Sentence Relation to Improve Cross-lingual Dense
Retrieval [87.11836738011007]
We propose a multilingual multilingual language model called masked sentence model (MSM)
MSM consists of a sentence encoder to generate the sentence representations, and a document encoder applied to a sequence of sentence vectors from a document.
To train the model, we propose a masked sentence prediction task, which masks and predicts the sentence vector via a hierarchical contrastive loss with sampled negatives.
arXiv Detail & Related papers (2023-02-03T09:54:27Z) - Understanding Translationese in Cross-Lingual Summarization [106.69566000567598]
Cross-lingual summarization (MS) aims at generating a concise summary in a different target language.
To collect large-scale CLS data, existing datasets typically involve translation in their creation.
In this paper, we first confirm that different approaches of constructing CLS datasets will lead to different degrees of translationese.
arXiv Detail & Related papers (2022-12-14T13:41:49Z) - IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and
Languages [87.5457337866383]
We introduce the Image-Grounded Language Understanding Evaluation benchmark.
IGLUE brings together visual question answering, cross-modal retrieval, grounded reasoning, and grounded entailment tasks across 20 diverse languages.
We find that translate-test transfer is superior to zero-shot transfer and that few-shot learning is hard to harness for many tasks.
arXiv Detail & Related papers (2022-01-27T18:53:22Z) - Evaluating Multilingual Text Encoders for Unsupervised Cross-Lingual
Retrieval [51.60862829942932]
We present a systematic empirical study focused on the suitability of the state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks.
For sentence-level CLIR, we demonstrate that state-of-the-art performance can be achieved.
However, the peak performance is not met using the general-purpose multilingual text encoders off-the-shelf', but rather relying on their variants that have been further specialized for sentence understanding tasks.
arXiv Detail & Related papers (2021-01-21T00:15:38Z) - XL-WiC: A Multilingual Benchmark for Evaluating Semantic
Contextualization [98.61159823343036]
We present the Word-in-Context dataset (WiC) for assessing the ability to correctly model distinct meanings of a word.
We put forward a large multilingual benchmark, XL-WiC, featuring gold standards in 12 new languages.
Experimental results show that even when no tagged instances are available for a target language, models trained solely on the English data can attain competitive performance.
arXiv Detail & Related papers (2020-10-13T15:32:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.