CITADEL: Conditional Token Interaction via Dynamic Lexical Routing for
Efficient and Effective Multi-Vector Retrieval
- URL: http://arxiv.org/abs/2211.10411v1
- Date: Fri, 18 Nov 2022 18:27:35 GMT
- Title: CITADEL: Conditional Token Interaction via Dynamic Lexical Routing for
Efficient and Effective Multi-Vector Retrieval
- Authors: Minghan Li, Sheng-Chieh Lin, Barlas Oguz, Asish Ghoshal, Jimmy Lin,
Yashar Mehdad, Wen-tau Yih, and Xilun Chen
- Abstract summary: Multi-vector retrieval methods combine the merits of sparse (e.g. BM25) and dense (e.g. DPR) retrievers.
These methods are orders of magnitude slower and need much more space to store their indices compared to their single-vector counterparts.
We propose conditional token interaction via dynamic lexical routing, namely CITADEL, for efficient and effective multi-vector retrieval.
- Score: 72.90850213615427
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-vector retrieval methods combine the merits of sparse (e.g. BM25) and
dense (e.g. DPR) retrievers and have achieved state-of-the-art performance on
various retrieval tasks. These methods, however, are orders of magnitude slower
and need much more space to store their indices compared to their single-vector
counterparts. In this paper, we unify different multi-vector retrieval models
from a token routing viewpoint and propose conditional token interaction via
dynamic lexical routing, namely CITADEL, for efficient and effective
multi-vector retrieval. CITADEL learns to route different token vectors to the
predicted lexical ``keys'' such that a query token vector only interacts with
document token vectors routed to the same key. This design significantly
reduces the computation cost while maintaining high accuracy. Notably, CITADEL
achieves the same or slightly better performance than the previous state of the
art, ColBERT-v2, on both in-domain (MS MARCO) and out-of-domain (BEIR)
evaluations, while being nearly 40 times faster. Code and data are available at
https://github.com/facebookresearch/dpr-scale.
Related papers
- Multimodal Learned Sparse Retrieval with Probabilistic Expansion Control [66.78146440275093]
Learned retrieval (LSR) is a family of neural methods that encode queries and documents into sparse lexical vectors.
We explore the application of LSR to the multi-modal domain, with a focus on text-image retrieval.
Current approaches like LexLIP and STAIR require complex multi-step training on massive datasets.
Our proposed approach efficiently transforms dense vectors from a frozen dense model into sparse lexical vectors.
arXiv Detail & Related papers (2024-02-27T14:21:56Z) - MST: Adaptive Multi-Scale Tokens Guided Interactive Segmentation [8.46894039954642]
We propose a novel multi-scale token adaptation algorithm for interactive segmentation.
By performing top-k operations across multi-scale tokens, the computational complexity is greatly simplified.
We also propose a token learning algorithm based on contrastive loss.
arXiv Detail & Related papers (2024-01-09T07:59:42Z) - LeanVec: Searching vectors faster by making them fit [1.0863382547662974]
We present LeanVec, a framework that combines linear dimensionality reduction with vector quantization to accelerate similarity search on high-dimensional vectors.
We show that LeanVec produces state-of-the-art results, with up to 3.7x improvement in search throughput and up to 4.9x faster index build time.
arXiv Detail & Related papers (2023-12-26T21:14:59Z) - SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot
Neural Sparse Retrieval [92.27387459751309]
We provide SPRINT, a unified Python toolkit for evaluating neural sparse retrieval.
We establish strong and reproducible zero-shot sparse retrieval baselines across the well-acknowledged benchmark, BEIR.
We show that SPLADEv2 produces sparse representations with a majority of tokens outside of the original query and document.
arXiv Detail & Related papers (2023-07-19T22:48:02Z) - Dynamic Perceiver for Efficient Visual Recognition [87.08210214417309]
We propose Dynamic Perceiver (Dyn-Perceiver) to decouple the feature extraction procedure and the early classification task.
A feature branch serves to extract image features, while a classification branch processes a latent code assigned for classification tasks.
Early exits are placed exclusively within the classification branch, thus eliminating the need for linear separability in low-level features.
arXiv Detail & Related papers (2023-06-20T03:00:22Z) - Rethinking the Role of Token Retrieval in Multi-Vector Retrieval [22.508682857329912]
Multi-vector retrieval models such as ColBERT [Khattab and Zaharia, 2020] allow token-level interactions between queries and documents.
We present XTR, ConteXtualized Token Retriever, which introduces a simple, yet novel, objective function that encourages the model to retrieve the most important document tokens first.
arXiv Detail & Related papers (2023-04-04T17:37:06Z) - Multi-Vector Retrieval as Sparse Alignment [21.892007741798853]
We propose a novel multi-vector retrieval model that learns sparsified pairwise alignments between query and document tokens.
We learn the sparse unary saliences with entropy-regularized linear programming, which outperforms other methods to achieve sparsity.
Our model often produces interpretable alignments and significantly improves its performance when from larger language models.
arXiv Detail & Related papers (2022-11-02T16:49:58Z) - Efficient Nearest Neighbor Search for Cross-Encoder Models using Matrix
Factorization [60.91600465922932]
We present an approach that avoids the use of a dual-encoder for retrieval, relying solely on the cross-encoder.
Our approach provides test-time recall-vs-computational cost trade-offs superior to the current widely-used methods.
arXiv Detail & Related papers (2022-10-23T00:32:04Z) - Retrieve Fast, Rerank Smart: Cooperative and Joint Approaches for
Improved Cross-Modal Retrieval [80.35589927511667]
Current state-of-the-art approaches to cross-modal retrieval process text and visual input jointly, relying on Transformer-based architectures with cross-attention mechanisms that attend over all words and objects in an image.
We propose a novel fine-tuning framework which turns any pretrained text-image multi-modal model into an efficient retrieval model.
Our experiments on a series of standard cross-modal retrieval benchmarks in monolingual, multilingual, and zero-shot setups, demonstrate improved accuracy and huge efficiency benefits over the state-of-the-art cross-encoders.
arXiv Detail & Related papers (2021-03-22T15:08:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.