Related papers: SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot Neural Sparse Retrieval

SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot Neural Sparse Retrieval

URL: http://arxiv.org/abs/2307.10488v1
Date: Wed, 19 Jul 2023 22:48:02 GMT
Title: SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot Neural Sparse Retrieval
Authors: Nandan Thakur, Kexin Wang, Iryna Gurevych, Jimmy Lin
Abstract summary: We provide SPRINT, a unified Python toolkit for evaluating neural sparse retrieval. We establish strong and reproducible zero-shot sparse retrieval baselines across the well-acknowledged benchmark, BEIR. We show that SPLADEv2 produces sparse representations with a majority of tokens outside of the original query and document.
Score: 92.27387459751309
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Traditionally, sparse retrieval systems relied on lexical representations to retrieve documents, such as BM25, dominated information retrieval tasks. With the onset of pre-trained transformer models such as BERT, neural sparse retrieval has led to a new paradigm within retrieval. Despite the success, there has been limited software supporting different sparse retrievers running in a unified, common environment. This hinders practitioners from fairly comparing different sparse models and obtaining realistic evaluation results. Another missing piece is, that a majority of prior work evaluates sparse retrieval models on in-domain retrieval, i.e. on a single dataset: MS MARCO. However, a key requirement in practical retrieval systems requires models that can generalize well to unseen out-of-domain, i.e. zero-shot retrieval tasks. In this work, we provide SPRINT, a unified Python toolkit based on Pyserini and Lucene, supporting a common interface for evaluating neural sparse retrieval. The toolkit currently includes five built-in models: uniCOIL, DeepImpact, SPARTA, TILDEv2 and SPLADEv2. Users can also easily add customized models by defining their term weighting method. Using our toolkit, we establish strong and reproducible zero-shot sparse retrieval baselines across the well-acknowledged benchmark, BEIR. Our results demonstrate that SPLADEv2 achieves the best average score of 0.470 nDCG@10 on BEIR amongst all neural sparse retrievers. In this work, we further uncover the reasons behind its performance gain. We show that SPLADEv2 produces sparse representations with a majority of tokens outside of the original query and document which is often crucial for its performance gains, i.e. a limitation among its other sparse counterparts. We provide our SPRINT toolkit, models, and data used in our experiments publicly here at https://github.com/thakur-nandan/sprint.

Related papers

Less is More: Making Smaller Language Models Competent Subgraph Retrievers for Multi-hop KGQA [51.3033125256716]
We model the subgraph retrieval task as a conditional generation task handled by small language models. Our base generative subgraph retrieval model, consisting of only 220M parameters, competitive retrieval performance compared to state-of-the-art models. Our largest 3B model, when plugged with an LLM reader, sets new SOTA end-to-end performance on both the WebQSP and CWQ benchmarks.
arXiv Detail & Related papers (2024-10-08T15:22:36Z)
Mistral-SPLADE: LLMs for better Learned Sparse Retrieval [7.652738829153342]
We propose to use decoder-only model for learning semantic keyword expansion. We use Mistral as the backbone to develop our Learned Sparse Retriever similar to SPLADE. Our experiments support the hypothesis that a sparse retrieval model based on decoder only large language model (LLM) surpasses the performance of existing LSR systems.
arXiv Detail & Related papers (2024-08-20T18:21:54Z)
Contextualization with SPLADE for High Recall Retrieval [5.973857434357868]
High Recall Retrieval (HRR) is a search problem that optimize the cost of retrieving most relevant documents in a given collection. In this work, we leverage SPLADE, an efficient retrieval model that transforms documents into contextualized sparse vectors. It reduces 10% and 18% of the review cost in two HRR evaluation collections under a one-phase review workflow with a target recall of 80%.
arXiv Detail & Related papers (2024-05-07T03:05:37Z)
RealPatch: A Statistical Matching Framework for Model Patching with Real Samples [6.245453620070586]
RealPatch is a framework for simpler, faster, and more data-efficient data augmentation based on statistical matching. We show that RealPatch can successfully eliminate dataset leakage while reducing model leakage and maintaining high utility.
arXiv Detail & Related papers (2022-08-03T16:22:30Z)
Injecting Domain Adaptation with Learning-to-hash for Effective and Efficient Zero-shot Dense Retrieval [49.98615945702959]
We evaluate LTH and vector compression techniques for improving the downstream zero-shot retrieval accuracy of the TAS-B dense retriever. Our results demonstrate that, unlike prior work, LTH strategies when applied naively can underperform the zero-shot TAS-B dense retriever on average by up to 14% nDCG@10.
arXiv Detail & Related papers (2022-05-23T17:53:44Z)
UnifieR: A Unified Retriever for Large-Scale Retrieval [84.61239936314597]
Large-scale retrieval is to recall relevant documents from a huge collection given a query. Recent retrieval methods based on pre-trained language models (PLM) can be coarsely categorized into either dense-vector or lexicon-based paradigms. We propose a new learning framework, UnifieR which unifies dense-vector and lexicon-based retrieval in one model with a dual-representing capability.
arXiv Detail & Related papers (2022-05-23T11:01:59Z)
Autoregressive Search Engines: Generating Substrings as Document Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers. Previous work has explored ways to partition the search space into hierarchical structures. In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z)
LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text Retrieval [55.097573036580066]
Experimental results show that LaPraDoR achieves state-of-the-art performance compared with supervised dense retrieval models. Compared to re-ranking, our lexicon-enhanced approach can be run in milliseconds (22.5x faster) while achieving superior performance.
arXiv Detail & Related papers (2022-03-11T18:53:12Z)
InPars: Data Augmentation for Information Retrieval using Large Language Models [5.851846467503597]
In this work, we harness the few-shot capabilities of large pretrained language models as synthetic data generators for information retrieval tasks. We show that models finetuned solely on our unsupervised dataset outperform strong baselines such as BM25. retrievers finetuned on both supervised and our synthetic data achieve better zero-shot transfer than models finetuned only on supervised data.
arXiv Detail & Related papers (2022-02-10T16:52:45Z)
SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval [11.38022203865326]
SPLADE model provides highly sparse representations and competitive results with respect to state-of-the-art dense and sparse approaches. We modify the pooling mechanism, benchmark a model solely based on document expansion, and introduce models trained with distillation. Overall, SPLADE is considerably improved with more than $9$% gains on NDCG@10 on TREC DL 2019, leading to state-of-the-art results on the BEIR benchmark.
arXiv Detail & Related papers (2021-09-21T10:43:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.