Related papers: Locally-Adaptive Quantization for Streaming Vector Search

Locally-Adaptive Quantization for Streaming Vector Search

URL: http://arxiv.org/abs/2402.02044v1
Date: Sat, 3 Feb 2024 05:43:39 GMT
Title: Locally-Adaptive Quantization for Streaming Vector Search
Authors: Cecilia Aguerrebere and Mark Hildebrand and Ishwar Singh Bhati and Theodore Willke and Mariano Tepper
Abstract summary: Locally-Adaptive Vector Quantization (LVQ), a highly efficient vector compression method, yields state-of-the-art search performance for non-evolving databases. We introduce two improvements of LVQ: Turbo LVQ and multi-means LVQ that boost its search performance by up to 28% and 27%. Our studies show that LVQ and its new variants enable blazing fast vector search, outperforming its closest competitor by up to 9.4x for identically distributed data.
Score: 1.151101202055732
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Retrieving the most similar vector embeddings to a given query among a massive collection of vectors has long been a key component of countless real-world applications. The recently introduced Retrieval-Augmented Generation is one of the most prominent examples. For many of these applications, the database evolves over time by inserting new data and removing outdated data. In these cases, the retrieval problem is known as streaming similarity search. While Locally-Adaptive Vector Quantization (LVQ), a highly efficient vector compression method, yields state-of-the-art search performance for non-evolving databases, its usefulness in the streaming setting has not been yet established. In this work, we study LVQ in streaming similarity search. In support of our evaluation, we introduce two improvements of LVQ: Turbo LVQ and multi-means LVQ that boost its search performance by up to 28% and 27%, respectively. Our studies show that LVQ and its new variants enable blazing fast vector search, outperforming its closest competitor by up to 9.4x for identically distributed data and by up to 8.8x under the challenging scenario of data distribution shifts (i.e., where the statistical distribution of the data changes over time). We release our contributions as part of Scalable Vector Search, an open-source library for high-performance similarity search.

Related papers

Leveraging Approximate Caching for Faster Retrieval-Augmented Generation [1.3450852784287828]
Retrieval-augmented generation (RAG) enhances the reliability of large language model (LLM) answers by integrating external knowledge. RAG increases the end-to-end inference time since looking for relevant documents from large vector databases is computationally expensive. We introduce Proximity, an approximate key-value cache that optimize the RAG workflow by leveraging similarities in user queries.
arXiv Detail & Related papers (2025-03-07T15:54:04Z)
Adaptive Semantic Prompt Caching with VectorQ [78.59891542553179]
Vector similarity metrics assign a numerical score to quantify the similarity between an embedded prompt and its nearest neighbor in the cache. Existing systems rely on a static threshold to classify whether the similarity score is sufficiently high to result in a cache hit. We show that this one-size-fits-all threshold is insufficient across different embeddings. We propose VectorQ, an online framework with a threshold convergence guarantee to learn embedding-specific threshold regions.
arXiv Detail & Related papers (2025-02-06T04:16:20Z)
SPFresh: Incremental In-Place Update for Billion-Scale Vector Search [19.245438083030006]
We introduce SPFresh, a system that supports in-place vector updates. At the heart of SPFresh is LIRE, a lightweight incremental rebalancing protocol. With LIRE, SPFresh provides superior query latency and accuracy to solutions based on global rebuild.
arXiv Detail & Related papers (2024-10-18T13:24:18Z)
GleanVec: Accelerating vector search with minimalist nonlinear dimensionality reduction [1.1599570446840546]
Cross-modal retrieval (e.g., where a text query is used to find images) is gaining momentum rapidly. It is challenging to achieve high accuracy as the queries often have different statistical distributions than the database vectors. We present new linear and nonlinear methods for dimensionality reduction to accelerate high-dimensional vector search.
arXiv Detail & Related papers (2024-10-14T21:14:27Z)
Operational Advice for Dense and Sparse Retrievers: HNSW, Flat, or Inverted Indexes? [62.57689536630933]
We provide experimental results on the BEIR dataset using the open-source Lucene search library. Our results provide guidance for today's search practitioner in understanding the design space of dense and sparse retrievers.
arXiv Detail & Related papers (2024-09-10T12:46:23Z)
Efficient Data Access Paths for Mixed Vector-Relational Search [8.80592433569832]
Machine learning and the adoption of data processing methods using vector embeddings sparked a great interest in creating systems for vector data management. While the predominant approach of vector data management is to use specialized index structures for fast search over the entirety of the vector embeddings, once combined with other (meta)data, the search queries can also become selective on relational attributes. As using vector indexes differs from traditional relational data access, we revisit and analyze alternative access paths for efficient mixed vector-relational search.
arXiv Detail & Related papers (2024-03-23T11:34:17Z)
Multimodal Learned Sparse Retrieval with Probabilistic Expansion Control [66.78146440275093]
Learned retrieval (LSR) is a family of neural methods that encode queries and documents into sparse lexical vectors. We explore the application of LSR to the multi-modal domain, with a focus on text-image retrieval. Current approaches like LexLIP and STAIR require complex multi-step training on massive datasets. Our proposed approach efficiently transforms dense vectors from a frozen dense model into sparse lexical vectors.
arXiv Detail & Related papers (2024-02-27T14:21:56Z)
LeanVec: Searching vectors faster by making them fit [1.0863382547662974]
We present LeanVec, a framework that combines linear dimensionality reduction with vector quantization to accelerate similarity search on high-dimensional vectors. We show that LeanVec produces state-of-the-art results, with up to 3.7x improvement in search throughput and up to 4.9x faster index build time.
arXiv Detail & Related papers (2023-12-26T21:14:59Z)
ReFIT: Relevance Feedback from a Reranker during Inference [109.33278799999582]
Retrieve-and-rerank is a prevalent framework in neural information retrieval. We propose to leverage the reranker to improve recall by making it provide relevance feedback to the retriever at inference time.
arXiv Detail & Related papers (2023-05-19T15:30:33Z)
Similarity search in the blink of an eye with compressed indices [3.39271933237479]
Graph-based indices are currently the best performing techniques for billion-scale similarity search. We present new techniques and systems for creating faster and smaller graph-based indices.
arXiv Detail & Related papers (2023-04-07T23:10:39Z)
Improving Out-of-Distribution Generalization of Neural Rerankers with Contextualized Late Interaction [52.63663547523033]
Late interaction, the simplest form of multi-vector, is also helpful to neural rerankers that only use the [] vector to compute the similarity score. We show that the finding is consistent across different model sizes and first-stage retrievers of diverse natures.
arXiv Detail & Related papers (2023-02-13T18:42:17Z)
CITADEL: Conditional Token Interaction via Dynamic Lexical Routing for Efficient and Effective Multi-Vector Retrieval [72.90850213615427]
Multi-vector retrieval methods combine the merits of sparse (e.g. BM25) and dense (e.g. DPR) retrievers. These methods are orders of magnitude slower and need much more space to store their indices compared to their single-vector counterparts. We propose conditional token interaction via dynamic lexical routing, namely CITADEL, for efficient and effective multi-vector retrieval.
arXiv Detail & Related papers (2022-11-18T18:27:35Z)
IRLI: Iterative Re-partitioning for Learning to Index [104.72641345738425]
Methods have to trade between obtaining high accuracy while maintaining load balance and scalability in distributed settings. We propose a novel approach called IRLI, which iteratively partitions the items by learning the relevant buckets directly from the query-item relevance data. We mathematically show that IRLI retrieves the correct item with high probability under very natural assumptions and provides superior load balancing.
arXiv Detail & Related papers (2021-03-17T23:13:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.