Resources for Brewing BEIR: Reproducible Reference Models and an
Official Leaderboard
- URL: http://arxiv.org/abs/2306.07471v1
- Date: Tue, 13 Jun 2023 00:26:18 GMT
- Title: Resources for Brewing BEIR: Reproducible Reference Models and an
Official Leaderboard
- Authors: Ehsan Kamalloo, Nandan Thakur, Carlos Lassance, Xueguang Ma,
Jheng-Hong Yang, Jimmy Lin
- Abstract summary: BEIR is a benchmark dataset for evaluation of information retrieval models across 18 different domain/task combinations.
Our work addresses two shortcomings that prevent the benchmark from achieving its full potential.
- Score: 47.73060223236792
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: BEIR is a benchmark dataset for zero-shot evaluation of information retrieval
models across 18 different domain/task combinations. In recent years, we have
witnessed the growing popularity of a representation learning approach to
building retrieval models, typically using pretrained transformers in a
supervised setting. This naturally begs the question: How effective are these
models when presented with queries and documents that differ from the training
data? Examples include searching in different domains (e.g., medical or legal
text) and with different types of queries (e.g., keywords vs. well-formed
questions). While BEIR was designed to answer these questions, our work
addresses two shortcomings that prevent the benchmark from achieving its full
potential: First, the sophistication of modern neural methods and the
complexity of current software infrastructure create barriers to entry for
newcomers. To this end, we provide reproducible reference implementations that
cover the two main classes of approaches: learned dense and sparse models.
Second, there does not exist a single authoritative nexus for reporting the
effectiveness of different models on BEIR, which has led to difficulty in
comparing different methods. To remedy this, we present an official
self-service BEIR leaderboard that provides fair and consistent comparisons of
retrieval models. By addressing both shortcomings, our work facilitates future
explorations in a range of interesting research questions that BEIR enables.
Related papers
- Teaching Smaller Language Models To Generalise To Unseen Compositional Questions (Full Thesis) [0.0]
We train our models to answer diverse questions by instilling an ability to reason over a retrieved context.
We acquire context from two knowledge sources; a Wikipedia corpus queried using a multi-hop dense retrieval system with novel extensions, and from rationales generated from a larger Language Model optimised to run in a lower resource environment.
arXiv Detail & Related papers (2024-11-25T23:25:34Z) - List-aware Reranking-Truncation Joint Model for Search and
Retrieval-augmented Generation [80.12531449946655]
We propose a Reranking-Truncation joint model (GenRT) that can perform the two tasks concurrently.
GenRT integrates reranking and truncation via generative paradigm based on encoder-decoder architecture.
Our method achieves SOTA performance on both reranking and truncation tasks for web search and retrieval-augmented LLMs.
arXiv Detail & Related papers (2024-02-05T06:52:53Z) - Zero-shot Retrieval: Augmenting Pre-trained Models with Search Engines [83.65380507372483]
Large pre-trained models can dramatically reduce the amount of task-specific data required to solve a problem, but they often fail to capture domain-specific nuances out of the box.
This paper shows how to leverage recent advances in NLP and multi-modal learning to augment a pre-trained model with search engine retrieval.
arXiv Detail & Related papers (2023-11-29T05:33:28Z) - RelVAE: Generative Pretraining for few-shot Visual Relationship
Detection [2.2230760534775915]
We present the first pretraining method for few-shot predicate classification that does not require any annotated relations.
We construct few-shot training splits and show quantitative experiments on VG200 and VRD datasets.
arXiv Detail & Related papers (2023-11-27T19:08:08Z) - UniKGQA: Unified Retrieval and Reasoning for Solving Multi-hop Question
Answering Over Knowledge Graph [89.98762327725112]
Multi-hop Question Answering over Knowledge Graph(KGQA) aims to find the answer entities that are multiple hops away from the topic entities mentioned in a natural language question.
We propose UniKGQA, a novel approach for multi-hop KGQA task, by unifying retrieval and reasoning in both model architecture and parameter learning.
arXiv Detail & Related papers (2022-12-02T04:08:09Z) - Incorporating Relevance Feedback for Information-Seeking Retrieval using
Few-Shot Document Re-Ranking [56.80065604034095]
We introduce a kNN approach that re-ranks documents based on their similarity with the query and the documents the user considers relevant.
To evaluate our different integration strategies, we transform four existing information retrieval datasets into the relevance feedback scenario.
arXiv Detail & Related papers (2022-10-19T16:19:37Z) - BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information
Retrieval Models [41.45240621979654]
We introduce BEIR, a heterogeneous benchmark for information retrieval.
We study the effectiveness of nine state-of-the-art retrieval models in a zero-shot evaluation setup.
Dense-retrieval models are computationally more efficient but often underperform other approaches.
arXiv Detail & Related papers (2021-04-17T23:29:55Z) - A Neural Few-Shot Text Classification Reality Check [4.689945062721168]
Several neural few-shot classification models have emerged, yielding significant progress over time.
In this paper, we compare all these models, first adapting those made in the field of image processing to NLP, and second providing them access to transformers.
We then test these models equipped with the same transformer-based encoder on the intent detection task, known for having a large number of classes.
arXiv Detail & Related papers (2021-01-28T15:46:14Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z) - Beyond [CLS] through Ranking by Generation [22.27275853263564]
We revisit the generative framework for information retrieval.
We show that our generative approaches are as effective as state-of-the-art semantic similarity-based discriminative models for the answer selection task.
arXiv Detail & Related papers (2020-10-06T22:56:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.