CITADEL: Conditional Token Interaction via Dynamic Lexical Routing for
Efficient and Effective Multi-Vector Retrieval
- URL: http://arxiv.org/abs/2211.10411v1
- Date: Fri, 18 Nov 2022 18:27:35 GMT
- Title: CITADEL: Conditional Token Interaction via Dynamic Lexical Routing for
Efficient and Effective Multi-Vector Retrieval
- Authors: Minghan Li, Sheng-Chieh Lin, Barlas Oguz, Asish Ghoshal, Jimmy Lin,
Yashar Mehdad, Wen-tau Yih, and Xilun Chen
- Abstract summary: Multi-vector retrieval methods combine the merits of sparse (e.g. BM25) and dense (e.g. DPR) retrievers.
These methods are orders of magnitude slower and need much more space to store their indices compared to their single-vector counterparts.
We propose conditional token interaction via dynamic lexical routing, namely CITADEL, for efficient and effective multi-vector retrieval.
- Score: 72.90850213615427
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-vector retrieval methods combine the merits of sparse (e.g. BM25) and
dense (e.g. DPR) retrievers and have achieved state-of-the-art performance on
various retrieval tasks. These methods, however, are orders of magnitude slower
and need much more space to store their indices compared to their single-vector
counterparts. In this paper, we unify different multi-vector retrieval models
from a token routing viewpoint and propose conditional token interaction via
dynamic lexical routing, namely CITADEL, for efficient and effective
multi-vector retrieval. CITADEL learns to route different token vectors to the
predicted lexical ``keys'' such that a query token vector only interacts with
document token vectors routed to the same key. This design significantly
reduces the computation cost while maintaining high accuracy. Notably, CITADEL
achieves the same or slightly better performance than the previous state of the
art, ColBERT-v2, on both in-domain (MS MARCO) and out-of-domain (BEIR)
evaluations, while being nearly 40 times faster. Code and data are available at
https://github.com/facebookresearch/dpr-scale.
Related papers
- Multi-Vector Index Compression in Any Modality [73.7330345057813]
Late interaction has emerged as a dominant paradigm for information retrieval in text, images, visual documents, and videos.<n>We introduce four approaches for index compression: sequence resizing, memory tokens, hierarchical pooling, and a novel attention-guided clustering (AGC)<n>AGC uses an attention-guided mechanism to identify the most semantically salient regions of a document as cluster centroids and to weight token aggregation.
arXiv Detail & Related papers (2026-02-24T18:57:33Z) - LEMUR: Learned Multi-Vector Retrieval [9.22384870426709]
We introduce LEMUR, a framework for multi-vector similarity search.<n>LEMUR consists of two consecutive problem reductions.<n>LEMUR is an order of magnitude faster than earlier multi-vector similarity search methods.
arXiv Detail & Related papers (2026-01-29T15:26:32Z) - Incorporating Token Importance in Multi-Vector Retrieval [12.87368993054882]
ColBERT encodes queries and documents using BERT, and computes similarity via fine-grained interactions over token-level vector representations.<n>We introduce enhancements to the Chamfer distance function by computing a weighted sum over query token contributions.<n>Our method achieves an average improvement of 1.28% in Recall@10 in the zero-shot setting using IDF-based weights, and 3.66% through few-shot fine-tuning.
arXiv Detail & Related papers (2025-11-20T06:58:31Z) - SCOPE: Saliency-Coverage Oriented Token Pruning for Efficient Multimodel LLMs [59.415473779171315]
We propose a novel visual token pruning strategy called textbfSaliency-textbfCoverage textbfOriented token textbfPruning for textbfEfficient MLLMs.
arXiv Detail & Related papers (2025-10-28T09:29:37Z) - Hybrid-Vector Retrieval for Visually Rich Documents: Combining Single-Vector Efficiency and Multi-Vector Accuracy [36.03315207229038]
HEAVEN is a two-stage hybrid-vector framework for visually rich document retrieval.<n>It efficiently retrieves candidate pages using a single-vector method over Visually-Summarized Pages.<n>It reranks candidates with a multi-vector method while filtering query tokens by linguistic importance to reduce redundant computations.
arXiv Detail & Related papers (2025-10-25T08:27:37Z) - Efficient Constant-Space Multi-Vector Retrieval [25.834026445124874]
We propose encoding documents to a fixed number of vectors, which are no longer necessarily tied to the input tokens.
We find that passages can be effectively encoded into a fixed number of vectors while retaining most of the original effectiveness.
arXiv Detail & Related papers (2025-04-02T15:22:23Z) - MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings [15.275864151890511]
We introduce MUVERA (MUlti-VEctor Retrieval Algorithm), a retrieval mechanism which reduces multi-vector search to single-vector similarity search.
MUVERA achieves consistently good end-to-end recall and latency across a diverse set of the BEIR retrieval datasets.
arXiv Detail & Related papers (2024-05-29T20:40:20Z) - Multimodal Learned Sparse Retrieval with Probabilistic Expansion Control [66.78146440275093]
Learned retrieval (LSR) is a family of neural methods that encode queries and documents into sparse lexical vectors.
We explore the application of LSR to the multi-modal domain, with a focus on text-image retrieval.
Current approaches like LexLIP and STAIR require complex multi-step training on massive datasets.
Our proposed approach efficiently transforms dense vectors from a frozen dense model into sparse lexical vectors.
arXiv Detail & Related papers (2024-02-27T14:21:56Z) - LeanVec: Searching vectors faster by making them fit [1.0863382547662974]
We present LeanVec, a framework that combines linear dimensionality reduction with vector quantization to accelerate similarity search on high-dimensional vectors.
We show that LeanVec produces state-of-the-art results, with up to 3.7x improvement in search throughput and up to 4.9x faster index build time.
arXiv Detail & Related papers (2023-12-26T21:14:59Z) - SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot
Neural Sparse Retrieval [92.27387459751309]
We provide SPRINT, a unified Python toolkit for evaluating neural sparse retrieval.
We establish strong and reproducible zero-shot sparse retrieval baselines across the well-acknowledged benchmark, BEIR.
We show that SPLADEv2 produces sparse representations with a majority of tokens outside of the original query and document.
arXiv Detail & Related papers (2023-07-19T22:48:02Z) - Dynamic Perceiver for Efficient Visual Recognition [87.08210214417309]
We propose Dynamic Perceiver (Dyn-Perceiver) to decouple the feature extraction procedure and the early classification task.
A feature branch serves to extract image features, while a classification branch processes a latent code assigned for classification tasks.
Early exits are placed exclusively within the classification branch, thus eliminating the need for linear separability in low-level features.
arXiv Detail & Related papers (2023-06-20T03:00:22Z) - Rethinking the Role of Token Retrieval in Multi-Vector Retrieval [22.508682857329912]
Multi-vector retrieval models such as ColBERT [Khattab and Zaharia, 2020] allow token-level interactions between queries and documents.
We present XTR, ConteXtualized Token Retriever, which introduces a simple, yet novel, objective function that encourages the model to retrieve the most important document tokens first.
arXiv Detail & Related papers (2023-04-04T17:37:06Z) - Multi-Vector Retrieval as Sparse Alignment [21.892007741798853]
We propose a novel multi-vector retrieval model that learns sparsified pairwise alignments between query and document tokens.
We learn the sparse unary saliences with entropy-regularized linear programming, which outperforms other methods to achieve sparsity.
Our model often produces interpretable alignments and significantly improves its performance when from larger language models.
arXiv Detail & Related papers (2022-11-02T16:49:58Z) - Efficient Nearest Neighbor Search for Cross-Encoder Models using Matrix
Factorization [60.91600465922932]
We present an approach that avoids the use of a dual-encoder for retrieval, relying solely on the cross-encoder.
Our approach provides test-time recall-vs-computational cost trade-offs superior to the current widely-used methods.
arXiv Detail & Related papers (2022-10-23T00:32:04Z) - Retrieve Fast, Rerank Smart: Cooperative and Joint Approaches for
Improved Cross-Modal Retrieval [80.35589927511667]
Current state-of-the-art approaches to cross-modal retrieval process text and visual input jointly, relying on Transformer-based architectures with cross-attention mechanisms that attend over all words and objects in an image.
We propose a novel fine-tuning framework which turns any pretrained text-image multi-modal model into an efficient retrieval model.
Our experiments on a series of standard cross-modal retrieval benchmarks in monolingual, multilingual, and zero-shot setups, demonstrate improved accuracy and huge efficiency benefits over the state-of-the-art cross-encoders.
arXiv Detail & Related papers (2021-03-22T15:08:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.