Related papers: Resources for Brewing BEIR: Reproducible Reference Models and an Official Leaderboard

Resources for Brewing BEIR: Reproducible Reference Models and an Official Leaderboard

URL: http://arxiv.org/abs/2306.07471v1
Date: Tue, 13 Jun 2023 00:26:18 GMT
Title: Resources for Brewing BEIR: Reproducible Reference Models and an Official Leaderboard
Authors: Ehsan Kamalloo, Nandan Thakur, Carlos Lassance, Xueguang Ma, Jheng-Hong Yang, Jimmy Lin
Abstract summary: BEIR is a benchmark dataset for evaluation of information retrieval models across 18 different domain/task combinations. Our work addresses two shortcomings that prevent the benchmark from achieving its full potential.
Score: 47.73060223236792
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: BEIR is a benchmark dataset for zero-shot evaluation of information retrieval models across 18 different domain/task combinations. In recent years, we have witnessed the growing popularity of a representation learning approach to building retrieval models, typically using pretrained transformers in a supervised setting. This naturally begs the question: How effective are these models when presented with queries and documents that differ from the training data? Examples include searching in different domains (e.g., medical or legal text) and with different types of queries (e.g., keywords vs. well-formed questions). While BEIR was designed to answer these questions, our work addresses two shortcomings that prevent the benchmark from achieving its full potential: First, the sophistication of modern neural methods and the complexity of current software infrastructure create barriers to entry for newcomers. To this end, we provide reproducible reference implementations that cover the two main classes of approaches: learned dense and sparse models. Second, there does not exist a single authoritative nexus for reporting the effectiveness of different models on BEIR, which has led to difficulty in comparing different methods. To remedy this, we present an official self-service BEIR leaderboard that provides fair and consistent comparisons of retrieval models. By addressing both shortcomings, our work facilitates future explorations in a range of interesting research questions that BEIR enables.

Related papers

Teaching Smaller Language Models To Generalise To Unseen Compositional Questions (Full Thesis) [0.0]
We train our models to answer diverse questions by instilling an ability to reason over a retrieved context. We acquire context from two knowledge sources; a Wikipedia corpus queried using a multi-hop dense retrieval system with novel extensions, and from rationales generated from a larger Language Model optimised to run in a lower resource environment.
arXiv Detail & Related papers (2024-11-25T23:25:34Z)
Pointwise Mutual Information as a Performance Gauge for Retrieval-Augmented Generation [78.28197013467157]
We show that the pointwise mutual information between a context and a question is an effective gauge for language model performance. We propose two methods that use the pointwise mutual information between a document and a question as a gauge for selecting and constructing prompts that lead to better performance.
arXiv Detail & Related papers (2024-11-12T13:14:09Z)
List-aware Reranking-Truncation Joint Model for Search and Retrieval-augmented Generation [80.12531449946655]
We propose a Reranking-Truncation joint model (GenRT) that can perform the two tasks concurrently. GenRT integrates reranking and truncation via generative paradigm based on encoder-decoder architecture. Our method achieves SOTA performance on both reranking and truncation tasks for web search and retrieval-augmented LLMs.
arXiv Detail & Related papers (2024-02-05T06:52:53Z)
Zero-shot Retrieval: Augmenting Pre-trained Models with Search Engines [83.65380507372483]
Large pre-trained models can dramatically reduce the amount of task-specific data required to solve a problem, but they often fail to capture domain-specific nuances out of the box. This paper shows how to leverage recent advances in NLP and multi-modal learning to augment a pre-trained model with search engine retrieval.
arXiv Detail & Related papers (2023-11-29T05:33:28Z)
RelVAE: Generative Pretraining for few-shot Visual Relationship Detection [2.2230760534775915]
We present the first pretraining method for few-shot predicate classification that does not require any annotated relations. We construct few-shot training splits and show quantitative experiments on VG200 and VRD datasets.
arXiv Detail & Related papers (2023-11-27T19:08:08Z)
UniKGQA: Unified Retrieval and Reasoning for Solving Multi-hop Question Answering Over Knowledge Graph [89.98762327725112]
Multi-hop Question Answering over Knowledge Graph(KGQA) aims to find the answer entities that are multiple hops away from the topic entities mentioned in a natural language question. We propose UniKGQA, a novel approach for multi-hop KGQA task, by unifying retrieval and reasoning in both model architecture and parameter learning.
arXiv Detail & Related papers (2022-12-02T04:08:09Z)
Incorporating Relevance Feedback for Information-Seeking Retrieval using Few-Shot Document Re-Ranking [56.80065604034095]
We introduce a kNN approach that re-ranks documents based on their similarity with the query and the documents the user considers relevant. To evaluate our different integration strategies, we transform four existing information retrieval datasets into the relevance feedback scenario.
arXiv Detail & Related papers (2022-10-19T16:19:37Z)
BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models [41.45240621979654]
We introduce BEIR, a heterogeneous benchmark for information retrieval. We study the effectiveness of nine state-of-the-art retrieval models in a zero-shot evaluation setup. Dense-retrieval models are computationally more efficient but often underperform other approaches.
arXiv Detail & Related papers (2021-04-17T23:29:55Z)
A Neural Few-Shot Text Classification Reality Check [4.689945062721168]
Several neural few-shot classification models have emerged, yielding significant progress over time. In this paper, we compare all these models, first adapting those made in the field of image processing to NLP, and second providing them access to transformers. We then test these models equipped with the same transformer-based encoder on the intent detection task, known for having a large number of classes.
arXiv Detail & Related papers (2021-01-28T15:46:14Z)
Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings. We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data. We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z)
Beyond [CLS] through Ranking by Generation [22.27275853263564]
We revisit the generative framework for information retrieval. We show that our generative approaches are as effective as state-of-the-art semantic similarity-based discriminative models for the answer selection task.
arXiv Detail & Related papers (2020-10-06T22:56:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.