Dense Sparse Retrieval: Using Sparse Language Models for Inference
Efficient Dense Retrieval
- URL: http://arxiv.org/abs/2304.00114v1
- Date: Fri, 31 Mar 2023 20:21:32 GMT
- Title: Dense Sparse Retrieval: Using Sparse Language Models for Inference
Efficient Dense Retrieval
- Authors: Daniel Campos, ChengXiang Zhai
- Abstract summary: We study how sparse language models can be used for dense retrieval to improve inference efficiency.
We find that sparse language models can be used as direct replacements with little to no drop in accuracy and up to 4.3x improved inference speeds.
- Score: 37.22592489907125
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vector-based retrieval systems have become a common staple for academic and
industrial search applications because they provide a simple and scalable way
of extending the search to leverage contextual representations for documents
and queries. As these vector-based systems rely on contextual language models,
their usage commonly requires GPUs, which can be expensive and difficult to
manage. Given recent advances in introducing sparsity into language models for
improved inference efficiency, in this paper, we study how sparse language
models can be used for dense retrieval to improve inference efficiency. Using
the popular retrieval library Tevatron and the MSMARCO, NQ, and TriviaQA
datasets, we find that sparse language models can be used as direct
replacements with little to no drop in accuracy and up to 4.3x improved
inference speeds
Related papers
- Less is More: Making Smaller Language Models Competent Subgraph Retrievers for Multi-hop KGQA [51.3033125256716]
We model the subgraph retrieval task as a conditional generation task handled by small language models.
Our base generative subgraph retrieval model, consisting of only 220M parameters, competitive retrieval performance compared to state-of-the-art models.
Our largest 3B model, when plugged with an LLM reader, sets new SOTA end-to-end performance on both the WebQSP and CWQ benchmarks.
arXiv Detail & Related papers (2024-10-08T15:22:36Z) - Mistral-SPLADE: LLMs for better Learned Sparse Retrieval [7.652738829153342]
We propose to use decoder-only model for learning semantic keyword expansion.
We use Mistral as the backbone to develop our Learned Sparse Retriever similar to SPLADE.
Our experiments support the hypothesis that a sparse retrieval model based on decoder only large language model (LLM) surpasses the performance of existing LSR systems.
arXiv Detail & Related papers (2024-08-20T18:21:54Z) - BRENT: Bidirectional Retrieval Enhanced Norwegian Transformer [1.911678487931003]
Retrieval-based language models are increasingly employed in question-answering tasks.
We develop the first Norwegian retrieval-based model by adapting the REALM framework.
We show that this type of training improves the reader's performance on extractive question-answering.
arXiv Detail & Related papers (2023-04-19T13:40:47Z) - UnifieR: A Unified Retriever for Large-Scale Retrieval [84.61239936314597]
Large-scale retrieval is to recall relevant documents from a huge collection given a query.
Recent retrieval methods based on pre-trained language models (PLM) can be coarsely categorized into either dense-vector or lexicon-based paradigms.
We propose a new learning framework, UnifieR which unifies dense-vector and lexicon-based retrieval in one model with a dual-representing capability.
arXiv Detail & Related papers (2022-05-23T11:01:59Z) - Autoregressive Search Engines: Generating Substrings as Document
Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers.
Previous work has explored ways to partition the search space into hierarchical structures.
In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z) - Efficient Nearest Neighbor Language Models [114.40866461741795]
Non-parametric neural language models (NLMs) learn predictive distributions of text utilizing an external datastore.
We show how to achieve up to a 6x speed-up in inference speed while retaining comparable performance.
arXiv Detail & Related papers (2021-09-09T12:32:28Z) - Comparison of Interactive Knowledge Base Spelling Correction Models for
Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict.
This work shows a comparison of a neural model and character language models with varying amounts on target language data.
Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.