SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot
Neural Sparse Retrieval
- URL: http://arxiv.org/abs/2307.10488v1
- Date: Wed, 19 Jul 2023 22:48:02 GMT
- Title: SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot
Neural Sparse Retrieval
- Authors: Nandan Thakur, Kexin Wang, Iryna Gurevych, Jimmy Lin
- Abstract summary: We provide SPRINT, a unified Python toolkit for evaluating neural sparse retrieval.
We establish strong and reproducible zero-shot sparse retrieval baselines across the well-acknowledged benchmark, BEIR.
We show that SPLADEv2 produces sparse representations with a majority of tokens outside of the original query and document.
- Score: 92.27387459751309
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Traditionally, sparse retrieval systems relied on lexical representations to
retrieve documents, such as BM25, dominated information retrieval tasks. With
the onset of pre-trained transformer models such as BERT, neural sparse
retrieval has led to a new paradigm within retrieval. Despite the success,
there has been limited software supporting different sparse retrievers running
in a unified, common environment. This hinders practitioners from fairly
comparing different sparse models and obtaining realistic evaluation results.
Another missing piece is, that a majority of prior work evaluates sparse
retrieval models on in-domain retrieval, i.e. on a single dataset: MS MARCO.
However, a key requirement in practical retrieval systems requires models that
can generalize well to unseen out-of-domain, i.e. zero-shot retrieval tasks. In
this work, we provide SPRINT, a unified Python toolkit based on Pyserini and
Lucene, supporting a common interface for evaluating neural sparse retrieval.
The toolkit currently includes five built-in models: uniCOIL, DeepImpact,
SPARTA, TILDEv2 and SPLADEv2. Users can also easily add customized models by
defining their term weighting method. Using our toolkit, we establish strong
and reproducible zero-shot sparse retrieval baselines across the
well-acknowledged benchmark, BEIR. Our results demonstrate that SPLADEv2
achieves the best average score of 0.470 nDCG@10 on BEIR amongst all neural
sparse retrievers. In this work, we further uncover the reasons behind its
performance gain. We show that SPLADEv2 produces sparse representations with a
majority of tokens outside of the original query and document which is often
crucial for its performance gains, i.e. a limitation among its other sparse
counterparts. We provide our SPRINT toolkit, models, and data used in our
experiments publicly here at https://github.com/thakur-nandan/sprint.
Related papers
- Less is More: Making Smaller Language Models Competent Subgraph Retrievers for Multi-hop KGQA [51.3033125256716]
We model the subgraph retrieval task as a conditional generation task handled by small language models.
Our base generative subgraph retrieval model, consisting of only 220M parameters, competitive retrieval performance compared to state-of-the-art models.
Our largest 3B model, when plugged with an LLM reader, sets new SOTA end-to-end performance on both the WebQSP and CWQ benchmarks.
arXiv Detail & Related papers (2024-10-08T15:22:36Z) - Mistral-SPLADE: LLMs for better Learned Sparse Retrieval [7.652738829153342]
We propose to use decoder-only model for learning semantic keyword expansion.
We use Mistral as the backbone to develop our Learned Sparse Retriever similar to SPLADE.
Our experiments support the hypothesis that a sparse retrieval model based on decoder only large language model (LLM) surpasses the performance of existing LSR systems.
arXiv Detail & Related papers (2024-08-20T18:21:54Z) - Contextualization with SPLADE for High Recall Retrieval [5.973857434357868]
High Recall Retrieval (HRR) is a search problem that optimize the cost of retrieving most relevant documents in a given collection.
In this work, we leverage SPLADE, an efficient retrieval model that transforms documents into contextualized sparse vectors.
It reduces 10% and 18% of the review cost in two HRR evaluation collections under a one-phase review workflow with a target recall of 80%.
arXiv Detail & Related papers (2024-05-07T03:05:37Z) - RealPatch: A Statistical Matching Framework for Model Patching with Real
Samples [6.245453620070586]
RealPatch is a framework for simpler, faster, and more data-efficient data augmentation based on statistical matching.
We show that RealPatch can successfully eliminate dataset leakage while reducing model leakage and maintaining high utility.
arXiv Detail & Related papers (2022-08-03T16:22:30Z) - Injecting Domain Adaptation with Learning-to-hash for Effective and
Efficient Zero-shot Dense Retrieval [49.98615945702959]
We evaluate LTH and vector compression techniques for improving the downstream zero-shot retrieval accuracy of the TAS-B dense retriever.
Our results demonstrate that, unlike prior work, LTH strategies when applied naively can underperform the zero-shot TAS-B dense retriever on average by up to 14% nDCG@10.
arXiv Detail & Related papers (2022-05-23T17:53:44Z) - UnifieR: A Unified Retriever for Large-Scale Retrieval [84.61239936314597]
Large-scale retrieval is to recall relevant documents from a huge collection given a query.
Recent retrieval methods based on pre-trained language models (PLM) can be coarsely categorized into either dense-vector or lexicon-based paradigms.
We propose a new learning framework, UnifieR which unifies dense-vector and lexicon-based retrieval in one model with a dual-representing capability.
arXiv Detail & Related papers (2022-05-23T11:01:59Z) - Autoregressive Search Engines: Generating Substrings as Document
Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers.
Previous work has explored ways to partition the search space into hierarchical structures.
In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z) - LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text
Retrieval [55.097573036580066]
Experimental results show that LaPraDoR achieves state-of-the-art performance compared with supervised dense retrieval models.
Compared to re-ranking, our lexicon-enhanced approach can be run in milliseconds (22.5x faster) while achieving superior performance.
arXiv Detail & Related papers (2022-03-11T18:53:12Z) - InPars: Data Augmentation for Information Retrieval using Large Language
Models [5.851846467503597]
In this work, we harness the few-shot capabilities of large pretrained language models as synthetic data generators for information retrieval tasks.
We show that models finetuned solely on our unsupervised dataset outperform strong baselines such as BM25.
retrievers finetuned on both supervised and our synthetic data achieve better zero-shot transfer than models finetuned only on supervised data.
arXiv Detail & Related papers (2022-02-10T16:52:45Z) - SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval [11.38022203865326]
SPLADE model provides highly sparse representations and competitive results with respect to state-of-the-art dense and sparse approaches.
We modify the pooling mechanism, benchmark a model solely based on document expansion, and introduce models trained with distillation.
Overall, SPLADE is considerably improved with more than $9$% gains on NDCG@10 on TREC DL 2019, leading to state-of-the-art results on the BEIR benchmark.
arXiv Detail & Related papers (2021-09-21T10:43:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.