Building Russian Benchmark for Evaluation of Information Retrieval Models
- URL: http://arxiv.org/abs/2504.12879v1
- Date: Thu, 17 Apr 2025 12:11:14 GMT
- Title: Building Russian Benchmark for Evaluation of Information Retrieval Models
- Authors: Grigory Kovalev, Mikhail Tikhomirov, Evgeny Kozhevnikov, Max Kornilov, Natalia Loukachevitch,
- Abstract summary: RusBEIR is a benchmark for evaluation of information retrieval models in the Russian language.<n>It integrates adapted, translated, and newly created datasets, enabling comparison of lexical and neural models.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce RusBEIR, a comprehensive benchmark designed for zero-shot evaluation of information retrieval (IR) models in the Russian language. Comprising 17 datasets from various domains, it integrates adapted, translated, and newly created datasets, enabling systematic comparison of lexical and neural models. Our study highlights the importance of preprocessing for lexical models in morphologically rich languages and confirms BM25 as a strong baseline for full-document retrieval. Neural models, such as mE5-large and BGE-M3, demonstrate superior performance on most datasets, but face challenges with long-document retrieval due to input size constraints. RusBEIR offers a unified, open-source framework that promotes research in Russian-language information retrieval.
Related papers
- Advancing Retrieval-Augmented Generation for Persian: Development of Language Models, Comprehensive Benchmarks, and Best Practices for Optimization [0.0]
The research aims to improve retrieval and generation accuracy by introducing Persian-specific models.<n>Three datasets-general knowledge(PQuad), scientifically specialized texts, and organizational reports- were used to assess these models.<n>MatinaSRoberta outperformed previous embeddings, achieving superior contextual relevance and retrieval accuracy across datasets.
arXiv Detail & Related papers (2025-01-08T22:16:40Z) - RoLargeSum: A Large Dialect-Aware Romanian News Dataset for Summary, Headline, and Keyword Generation [2.3577273565334522]
RoLargeSum is a novel large-scale summarization dataset for the Romanian language.
It was crawled from various publicly available news websites from Romania and the Republic of Moldova.
arXiv Detail & Related papers (2024-12-15T21:27:33Z) - The Russian-focused embedders' exploration: ruMTEB benchmark and Russian embedding model design [39.80182519545138]
This paper focuses on research related to embedding models in the Russian language.<n>It introduces a new Russian-focused embedding model called ru-en-RoSBERTa and the ruMTEB benchmark.
arXiv Detail & Related papers (2024-08-22T15:53:23Z) - MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine [53.01393667775077]
This paper introduces MedTrinity-25M, a comprehensive, large-scale multimodal dataset for medicine.
It covers over 25 million images across 10 modalities with multigranular annotations for more than 65 diseases.
Unlike the existing multimodal datasets, which are limited by the availability of image-text pairs, we have developed the first automated pipeline.
arXiv Detail & Related papers (2024-08-06T02:09:35Z) - AutoBencher: Towards Declarative Benchmark Construction [74.54640925146289]
We use AutoBencher to create datasets for math, multilinguality, knowledge, and safety.<n>The scalability of AutoBencher allows it to test fine-grained categories knowledge, creating datasets that elicit 22% more model errors (i.e., difficulty) than existing benchmarks.
arXiv Detail & Related papers (2024-07-11T10:03:47Z) - Learning to Extract Structured Entities Using Language Models [52.281701191329]
Recent advances in machine learning have significantly impacted the field of information extraction.
We reformulate the task to be entity-centric, enabling the use of diverse metrics.
We contribute to the field by introducing Structured Entity Extraction and proposing the Approximate Entity Set OverlaP metric.
arXiv Detail & Related papers (2024-02-06T22:15:09Z) - BEIR-PL: Zero Shot Information Retrieval Benchmark for the Polish Language [4.720913027054481]
In this work, inspired by mMARCO and Mr.TyDi datasets, we translated all accessible open IR datasets into Polish.
We introduced the BEIR-PL benchmark -- a new benchmark which comprises 13 datasets.
We executed an evaluation and comparison of numerous IR models on the newly introduced BEIR-PL benchmark.
arXiv Detail & Related papers (2023-05-31T13:29:07Z) - Retrieval-based Disentangled Representation Learning with Natural
Language Supervision [61.75109410513864]
We present Vocabulary Disentangled Retrieval (VDR), a retrieval-based framework that harnesses natural language as proxies of the underlying data variation to drive disentangled representation learning.
Our approach employ a bi-encoder model to represent both data and natural language in a vocabulary space, enabling the model to distinguish intrinsic dimensions that capture characteristics within data through its natural language counterpart, thus disentanglement.
arXiv Detail & Related papers (2022-12-15T10:20:42Z) - Building Machine Translation Systems for the Next Thousand Languages [102.24310122155073]
We describe results in three research domains: building clean, web-mined datasets for 1500+ languages, developing practical MT models for under-served languages, and studying the limitations of evaluation metrics for these languages.
We hope that our work provides useful insights to practitioners working towards building MT systems for currently understudied languages, and highlights research directions that can complement the weaknesses of massively multilingual models in data-sparse settings.
arXiv Detail & Related papers (2022-05-09T00:24:13Z) - Russian SuperGLUE 1.1: Revising the Lessons not Learned by Russian NLP
models [53.95094814056337]
This paper presents Russian SuperGLUE 1.1, an updated benchmark styled after GLUE for Russian NLP models.
The new version includes a number of technical, user experience and methodological improvements.
We provide the integration of Russian SuperGLUE with a framework for industrial evaluation of the open-source models, MOROCCO.
arXiv Detail & Related papers (2022-02-15T23:45:30Z) - RuMedBench: A Russian Medical Language Understanding Benchmark [58.99199480170909]
The paper describes the open Russian medical language understanding benchmark covering several task types.
We prepare the unified format labeling, data split, and evaluation metrics for new tasks.
A single-number metric expresses a model's ability to cope with the benchmark.
arXiv Detail & Related papers (2022-01-17T16:23:33Z) - Leveraging Advantages of Interactive and Non-Interactive Models for
Vector-Based Cross-Lingual Information Retrieval [12.514666775853598]
We propose a novel framework to leverage the advantages of interactive and non-interactive models.
We introduce semi-interactive mechanism, which builds our model upon non-interactive architecture but encodes each document together with its associated multilingual queries.
Our methods significantly boost the retrieval accuracy while maintaining the computational efficiency.
arXiv Detail & Related papers (2021-11-03T03:03:19Z) - Learning from Context or Names? An Empirical Study on Neural Relation
Extraction [112.06614505580501]
We study the effect of two main information sources in text: textual context and entity mentions (names)
We propose an entity-masked contrastive pre-training framework for relation extraction (RE)
Our framework can improve the effectiveness and robustness of neural models in different RE scenarios.
arXiv Detail & Related papers (2020-10-05T11:21:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.