Locally-Adaptive Quantization for Streaming Vector Search
- URL: http://arxiv.org/abs/2402.02044v1
- Date: Sat, 3 Feb 2024 05:43:39 GMT
- Title: Locally-Adaptive Quantization for Streaming Vector Search
- Authors: Cecilia Aguerrebere and Mark Hildebrand and Ishwar Singh Bhati and
Theodore Willke and Mariano Tepper
- Abstract summary: Locally-Adaptive Vector Quantization (LVQ), a highly efficient vector compression method, yields state-of-the-art search performance for non-evolving databases.
We introduce two improvements of LVQ: Turbo LVQ and multi-means LVQ that boost its search performance by up to 28% and 27%.
Our studies show that LVQ and its new variants enable blazing fast vector search, outperforming its closest competitor by up to 9.4x for identically distributed data.
- Score: 1.151101202055732
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Retrieving the most similar vector embeddings to a given query among a
massive collection of vectors has long been a key component of countless
real-world applications. The recently introduced Retrieval-Augmented Generation
is one of the most prominent examples. For many of these applications, the
database evolves over time by inserting new data and removing outdated data. In
these cases, the retrieval problem is known as streaming similarity search.
While Locally-Adaptive Vector Quantization (LVQ), a highly efficient vector
compression method, yields state-of-the-art search performance for non-evolving
databases, its usefulness in the streaming setting has not been yet
established. In this work, we study LVQ in streaming similarity search. In
support of our evaluation, we introduce two improvements of LVQ: Turbo LVQ and
multi-means LVQ that boost its search performance by up to 28% and 27%,
respectively. Our studies show that LVQ and its new variants enable blazing
fast vector search, outperforming its closest competitor by up to 9.4x for
identically distributed data and by up to 8.8x under the challenging scenario
of data distribution shifts (i.e., where the statistical distribution of the
data changes over time). We release our contributions as part of Scalable
Vector Search, an open-source library for high-performance similarity search.
Related papers
- GleanVec: Accelerating vector search with minimalist nonlinear dimensionality reduction [1.1599570446840546]
Cross-modal retrieval (e.g., where a text query is used to find images) is gaining momentum rapidly.
It is challenging to achieve high accuracy as the queries often have different statistical distributions than the database vectors.
We present new linear and nonlinear methods for dimensionality reduction to accelerate high-dimensional vector search.
arXiv Detail & Related papers (2024-10-14T21:14:27Z) - RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval [24.472784635757016]
RetrievalAttention is a training-free approach to both accelerate attention computation and reduce GPU memory consumption.
Our evaluation shows that RetrievalAttention only needs to access 1--3% of data while maintaining high model accuracy.
arXiv Detail & Related papers (2024-09-16T17:59:52Z) - Efficient Data Access Paths for Mixed Vector-Relational Search [8.80592433569832]
Machine learning and the adoption of data processing methods using vector embeddings sparked a great interest in creating systems for vector data management.
While the predominant approach of vector data management is to use specialized index structures for fast search over the entirety of the vector embeddings, once combined with other (meta)data, the search queries can also become selective on relational attributes.
As using vector indexes differs from traditional relational data access, we revisit and analyze alternative access paths for efficient mixed vector-relational search.
arXiv Detail & Related papers (2024-03-23T11:34:17Z) - Multimodal Learned Sparse Retrieval with Probabilistic Expansion Control [66.78146440275093]
Learned retrieval (LSR) is a family of neural methods that encode queries and documents into sparse lexical vectors.
We explore the application of LSR to the multi-modal domain, with a focus on text-image retrieval.
Current approaches like LexLIP and STAIR require complex multi-step training on massive datasets.
Our proposed approach efficiently transforms dense vectors from a frozen dense model into sparse lexical vectors.
arXiv Detail & Related papers (2024-02-27T14:21:56Z) - LeanVec: Searching vectors faster by making them fit [1.0863382547662974]
We present LeanVec, a framework that combines linear dimensionality reduction with vector quantization to accelerate similarity search on high-dimensional vectors.
We show that LeanVec produces state-of-the-art results, with up to 3.7x improvement in search throughput and up to 4.9x faster index build time.
arXiv Detail & Related papers (2023-12-26T21:14:59Z) - ReFIT: Relevance Feedback from a Reranker during Inference [109.33278799999582]
Retrieve-and-rerank is a prevalent framework in neural information retrieval.
We propose to leverage the reranker to improve recall by making it provide relevance feedback to the retriever at inference time.
arXiv Detail & Related papers (2023-05-19T15:30:33Z) - Similarity search in the blink of an eye with compressed indices [3.39271933237479]
Graph-based indices are currently the best performing techniques for billion-scale similarity search.
We present new techniques and systems for creating faster and smaller graph-based indices.
arXiv Detail & Related papers (2023-04-07T23:10:39Z) - Improving Out-of-Distribution Generalization of Neural Rerankers with
Contextualized Late Interaction [52.63663547523033]
Late interaction, the simplest form of multi-vector, is also helpful to neural rerankers that only use the [] vector to compute the similarity score.
We show that the finding is consistent across different model sizes and first-stage retrievers of diverse natures.
arXiv Detail & Related papers (2023-02-13T18:42:17Z) - CITADEL: Conditional Token Interaction via Dynamic Lexical Routing for
Efficient and Effective Multi-Vector Retrieval [72.90850213615427]
Multi-vector retrieval methods combine the merits of sparse (e.g. BM25) and dense (e.g. DPR) retrievers.
These methods are orders of magnitude slower and need much more space to store their indices compared to their single-vector counterparts.
We propose conditional token interaction via dynamic lexical routing, namely CITADEL, for efficient and effective multi-vector retrieval.
arXiv Detail & Related papers (2022-11-18T18:27:35Z) - Injecting Domain Adaptation with Learning-to-hash for Effective and
Efficient Zero-shot Dense Retrieval [49.98615945702959]
We evaluate LTH and vector compression techniques for improving the downstream zero-shot retrieval accuracy of the TAS-B dense retriever.
Our results demonstrate that, unlike prior work, LTH strategies when applied naively can underperform the zero-shot TAS-B dense retriever on average by up to 14% nDCG@10.
arXiv Detail & Related papers (2022-05-23T17:53:44Z) - IRLI: Iterative Re-partitioning for Learning to Index [104.72641345738425]
Methods have to trade between obtaining high accuracy while maintaining load balance and scalability in distributed settings.
We propose a novel approach called IRLI, which iteratively partitions the items by learning the relevant buckets directly from the query-item relevance data.
We mathematically show that IRLI retrieves the correct item with high probability under very natural assumptions and provides superior load balancing.
arXiv Detail & Related papers (2021-03-17T23:13:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.