NeuCLIRBench: A Modern Evaluation Collection for Monolingual, Cross-Language, and Multilingual Information Retrieval
- URL: http://arxiv.org/abs/2511.14758v1
- Date: Tue, 18 Nov 2025 18:58:19 GMT
- Title: NeuCLIRBench: A Modern Evaluation Collection for Monolingual, Cross-Language, and Multilingual Information Retrieval
- Authors: Dawn Lawrie, James Mayfield, Eugene Yang, Andrew Yates, Sean MacAvaney, Ronak Pradeep, Scott Miller, Paul McNamee, Luca Soldani,
- Abstract summary: This paper presents NeuCLIRBench, an evaluation collection for cross-language and multilingual retrieval.<n>The collection consists of documents written in Chinese, Persian, and Russian, as well as those same documents machine translated into English.<n>The collection supports several retrieval scenarios including: monolingual retrieval in English, Chinese, Persian, or Russian.
- Score: 39.153319100127845
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: To measure advances in retrieval, test collections with relevance judgments that can faithfully distinguish systems are required. This paper presents NeuCLIRBench, an evaluation collection for cross-language and multilingual retrieval. The collection consists of documents written natively in Chinese, Persian, and Russian, as well as those same documents machine translated into English. The collection supports several retrieval scenarios including: monolingual retrieval in English, Chinese, Persian, or Russian; cross-language retrieval with English as the query language and one of the other three languages as the document language; and multilingual retrieval, again with English as the query language and relevant documents in all three languages. NeuCLIRBench combines the TREC NeuCLIR track topics of 2022, 2023, and 2024. The 250,128 judgments across approximately 150 queries for the monolingual and cross-language tasks and 100 queries for multilingual retrieval provide strong statistical discriminatory power to distinguish retrieval approaches. A fusion baseline of strong neural retrieval systems is included with the collection so that developers of reranking algorithms are no longer reliant on BM25 as their first-stage retriever. NeuCLIRBench is publicly available.
Related papers
- NeuCLIRTech: Chinese Monolingual and Cross-Language Information Retrieval Evaluation in a Challenging Domain [49.3943974580576]
This paper presents NeuCLIRTech, an evaluation collection for cross-language retrieval over technical information.<n>The collection consists of technical documents written in Chinese and those same documents machine translated into English.<n>The collection supports two retrieval scenarios: monolingual retrieval in Chinese, and cross-language retrieval with English as the query language.
arXiv Detail & Related papers (2026-02-05T05:57:55Z) - One Instruction Does Not Fit All: How Well Do Embeddings Align Personas and Instructions in Low-Resource Indian Languages? [1.071318785217926]
We present a benchmark spanning 12 Indian languages and four evaluation tasks.<n>E5-Large-Instruct achieves the highest Recall@1 of 27.4% on monolingual retrieval and 20.7% on cross-lingual transfer.<n>For classification, LaBSE attains 75.3% AUROC with strong calibration.
arXiv Detail & Related papers (2026-01-15T09:10:14Z) - Bridging Language Gaps: Advances in Cross-Lingual Information Retrieval with Multilingual LLMs [0.19116784879310025]
Cross-lingual information retrieval (CLIR) addresses the challenge of retrieving relevant documents written in languages different from that of the original query.<n>Recent advances have shifted from translation-based methods toward embedding-based approaches.<n>This survey provides a comprehensive overview of developments from early translation-based methods to state-of-the-art embedding-driven and generative techniques.
arXiv Detail & Related papers (2025-10-01T13:50:05Z) - VisR-Bench: An Empirical Study on Visual Retrieval-Augmented Generation for Multilingual Long Document Understanding [49.07705729597171]
VisR-Bench is a benchmark for question-driven multimodal retrieval in long documents.<n>Our benchmark comprises over 35K high-quality QA pairs across 1.2K documents.<n>We evaluate various retrieval models, including text-based methods, multimodal encoders, and MLLMs.
arXiv Detail & Related papers (2025-08-10T21:44:43Z) - CLIRudit: Cross-Lingual Information Retrieval of Scientific Documents [2.0277446818410994]
This paper presents CLIRudit, a new dataset created to evaluate cross-lingual academic search.<n>The dataset is built using bilingual article metadata from 'Erudit, a Canadian publishing platform.
arXiv Detail & Related papers (2025-04-22T20:55:08Z) - mFollowIR: a Multilingual Benchmark for Instruction Following in Retrieval [61.17793165194077]
We introduce mFollowIR, a benchmark for measuring instruction-following ability in retrieval models.<n>We present results for both multilingual (XX-XX) and cross-lingual (En-XX) performance.<n>We see strong cross-lingual performance with English-based retrievers that trained using instructions, but find a notable drop in performance in the multilingual setting.
arXiv Detail & Related papers (2025-01-31T16:24:46Z) - XAMPLER: Learning to Retrieve Cross-Lingual In-Context Examples [64.79218405438871]
We introduce XAMPLER: Cross-Lingual Example Retrieval, a method tailored to tackle the challenge of cross-lingual in-context learning.<n>XAMPLER first trains a retriever based on Glot500, a multilingual small language model.<n>It can directly retrieve English examples as few-shot examples for in-context learning of target languages.
arXiv Detail & Related papers (2024-05-08T15:13:33Z) - Soft Prompt Decoding for Multilingual Dense Retrieval [30.766917713997355]
We show that applying state-of-the-art approaches developed for cross-lingual information retrieval to MLIR tasks leads to sub-optimal performance.
This is due to the heterogeneous and imbalanced nature of multilingual collections.
We present KD-SPD, a novel soft prompt decoding approach for MLIR that implicitly "translates" the representation of documents in different languages into the same embedding space.
arXiv Detail & Related papers (2023-05-15T21:17:17Z) - Simple Yet Effective Neural Ranking and Reranking Baselines for
Cross-Lingual Information Retrieval [50.882816288076725]
Cross-lingual information retrieval is the task of searching documents in one language with queries in another.
We provide a conceptual framework for organizing different approaches to cross-lingual retrieval using multi-stage architectures for mono-lingual retrieval as a scaffold.
We implement simple yet effective reproducible baselines in the Anserini and Pyserini IR toolkits for test collections from the TREC 2022 NeuCLIR Track, in Persian, Russian, and Chinese.
arXiv Detail & Related papers (2023-04-03T14:17:00Z) - Cross-Lingual Training with Dense Retrieval for Document Retrieval [56.319511218754414]
We explore different transfer techniques for document ranking from English annotations to multiple non-English languages.
Experiments on the test collections in six languages (Chinese, Arabic, French, Hindi, Bengali, Spanish) from diverse language families.
We find that weakly-supervised target language transfer yields competitive performances against the generation-based target language transfer.
arXiv Detail & Related papers (2021-09-03T17:15:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.