Efficient Constant-Space Multi-Vector Retrieval
- URL: http://arxiv.org/abs/2504.01818v1
- Date: Wed, 02 Apr 2025 15:22:23 GMT
- Title: Efficient Constant-Space Multi-Vector Retrieval
- Authors: Sean MacAvaney, Antonio Mallia, Nicola Tonellotto,
- Abstract summary: We propose encoding documents to a fixed number of vectors, which are no longer necessarily tied to the input tokens.<n>We find that passages can be effectively encoded into a fixed number of vectors while retaining most of the original effectiveness.
- Score: 25.834026445124874
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-vector retrieval methods, exemplified by the ColBERT architecture, have shown substantial promise for retrieval by providing strong trade-offs in terms of retrieval latency and effectiveness. However, they come at a high cost in terms of storage since a (potentially compressed) vector needs to be stored for every token in the input collection. To overcome this issue, we propose encoding documents to a fixed number of vectors, which are no longer necessarily tied to the input tokens. Beyond reducing the storage costs, our approach has the advantage that document representations become of a fixed size on disk, allowing for better OS paging management. Through experiments using the MSMARCO passage corpus and BEIR with the ColBERT-v2 architecture, a representative multi-vector ranking model architecture, we find that passages can be effectively encoded into a fixed number of vectors while retaining most of the original effectiveness.
Related papers
- SECRET: Towards Scalable and Efficient Code Retrieval via Segmented Deep Hashing [83.35231185111464]
Deep learning has shifted the retrieval paradigm from lexical-based matching to encode source code and queries into vector representations.<n>Previous research proposes deep hashing-based methods, which generate hash codes for queries and code snippets and use Hamming distance for rapid recall of code candidates.<n>We propose a novel approach, which converts long hash codes calculated by existing deep hashing approaches into several short hash code segments through an iterative training strategy.
arXiv Detail & Related papers (2024-12-16T12:51:35Z) - PrefixKV: Adaptive Prefix KV Cache is What Vision Instruction-Following Models Need for Efficient Generation [65.36715026409873]
Key-value (KV) cache, necessitated by the lengthy input and output sequences, notably contributes to the high inference cost.
We present PrefixKV, which reframes the challenge of determining KV cache sizes for all layers into the task of searching for the optimal global prefix configuration.
Our method achieves the state-of-the-art performance compared with others.
arXiv Detail & Related papers (2024-12-04T15:48:59Z) - Reducing the Footprint of Multi-Vector Retrieval with Minimal Performance Impact via Token Pooling [5.232135930253723]
Multi-vector retrieval methods, spearheaded by ColBERT, have become an increasingly popular approach to Neural IR.
However, the storage and memory requirements necessary to store the large number of associated vectors remain an important drawback.
We introduce a simple clustering-based token pooling approach to aggressively reduce the number of vectors that need to be stored.
arXiv Detail & Related papers (2024-09-23T03:12:43Z) - SparseCL: Sparse Contrastive Learning for Contradiction Retrieval [87.02936971689817]
Contradiction retrieval refers to identifying and extracting documents that explicitly disagree with or refute the content of a query.
Existing methods such as similarity search and crossencoder models exhibit significant limitations.
We introduce SparseCL that leverages specially trained sentence embeddings designed to preserve subtle, contradictory nuances between sentences.
arXiv Detail & Related papers (2024-06-15T21:57:03Z) - MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings [15.275864151890511]
We introduce MUVERA (MUlti-VEctor Retrieval Algorithm), a retrieval mechanism which reduces multi-vector search to single-vector similarity search.
MUVERA achieves consistently good end-to-end recall and latency across a diverse set of the BEIR retrieval datasets.
arXiv Detail & Related papers (2024-05-29T20:40:20Z) - MASTER: Multi-task Pre-trained Bottlenecked Masked Autoencoders are
Better Dense Retrievers [140.0479479231558]
In this work, we aim to unify a variety of pre-training tasks into a multi-task pre-trained model, namely MASTER.
MASTER utilizes a shared-encoder multi-decoder architecture that can construct a representation bottleneck to compress the abundant semantic information across tasks into dense vectors.
arXiv Detail & Related papers (2022-12-15T13:57:07Z) - CITADEL: Conditional Token Interaction via Dynamic Lexical Routing for
Efficient and Effective Multi-Vector Retrieval [72.90850213615427]
Multi-vector retrieval methods combine the merits of sparse (e.g. BM25) and dense (e.g. DPR) retrievers.
These methods are orders of magnitude slower and need much more space to store their indices compared to their single-vector counterparts.
We propose conditional token interaction via dynamic lexical routing, namely CITADEL, for efficient and effective multi-vector retrieval.
arXiv Detail & Related papers (2022-11-18T18:27:35Z) - Efficient Nearest Neighbor Search for Cross-Encoder Models using Matrix
Factorization [60.91600465922932]
We present an approach that avoids the use of a dual-encoder for retrieval, relying solely on the cross-encoder.
Our approach provides test-time recall-vs-computational cost trade-offs superior to the current widely-used methods.
arXiv Detail & Related papers (2022-10-23T00:32:04Z) - Autoregressive Search Engines: Generating Substrings as Document
Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers.
Previous work has explored ways to partition the search space into hierarchical structures.
In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z) - SDR: Efficient Neural Re-ranking using Succinct Document Representation [4.9278175139681215]
We propose the Succinct Document Representation scheme that computes emphhighly compressed intermediate document representations.
Our method is highly efficient, achieving 4x-11.6x better compression rates for the same ranking quality.
arXiv Detail & Related papers (2021-10-03T07:43:16Z) - Sparse, Dense, and Attentional Representations for Text Retrieval [25.670835450331943]
Dual encoders perform retrieval by encoding documents and queries into dense lowdimensional vectors.
We investigate the capacity of this architecture relative to sparse bag-of-words models and attentional neural networks.
We propose a simple neural model that combines the efficiency of dual encoders with some of the expressiveness of more costly attentional architectures.
arXiv Detail & Related papers (2020-05-01T02:21:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.