Cost-Effective, Low Latency Vector Search with Azure Cosmos DB
- URL: http://arxiv.org/abs/2505.05885v1
- Date: Fri, 09 May 2025 08:53:59 GMT
- Title: Cost-Effective, Low Latency Vector Search with Azure Cosmos DB
- Authors: Nitish Upreti, Krishnan Sundaram, Hari Sudan Sundar, Samer Boshra, Balachandar Perumalswamy, Shivam Atri, Martin Chisholm, Revti Raman Singh, Greg Yang, Subramanyam Pattipaka, Tamara Hass, Nitesh Dudhey, James Codella, Mark Hildebrand, Magdalen Manohar, Jack Moffitt, Haiyang Xu, Naren Datha, Suryansh Gupta, Ravishankar Krishnaswamy, Prashant Gupta, Abhishek Sahu, Ritika Mor, Santosh Kulkarni, Hemeswari Varada, Sudhanshu Barthwal, Amar Sagare, Dinesh Billa, Zishan Fu, Neil Deshpande, Shaun Cooper, Kevin Pilch, Simon Moreno, Aayush Kataria, Vipul Vishal, Harsha Vardhan Simhadri,
- Abstract summary: We argue that a scalable, high-performance, and cost-efficient vector search system can be built inside a cloud-native operational database like Azure Cosmos DB.<n>This system uses a single vector index per partition stored in existing index trees, and kept in sync with underlying data.<n>It supports 20ms query latency over an index spanning 10 million of vectors, has stable recall over updates, and offers nearly 15x and 41x lower query cost compared to Disk and Pinecone serverless enterprise products.
- Score: 14.766278554685776
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Vector indexing enables semantic search over diverse corpora and has become an important interface to databases for both users and AI agents. Efficient vector search requires deep optimizations in database systems. This has motivated a new class of specialized vector databases that optimize for vector search quality and cost. Instead, we argue that a scalable, high-performance, and cost-efficient vector search system can be built inside a cloud-native operational database like Azure Cosmos DB while leveraging the benefits of a distributed database such as high availability, durability, and scale. We do this by deeply integrating DiskANN, a state-of-the-art vector indexing library, inside Azure Cosmos DB NoSQL. This system uses a single vector index per partition stored in existing index trees, and kept in sync with underlying data. It supports < 20ms query latency over an index spanning 10 million of vectors, has stable recall over updates, and offers nearly 15x and 41x lower query cost compared to Zilliz and Pinecone serverless enterprise products. It also scales out to billions of vectors via automatic partitioning. This convergent design presents a point in favor of integrating vector indices into operational databases in the context of recent debates on specialized vector databases, and offers a template for vector indexing in other databases.
Related papers
- LEANN: A Low-Storage Vector Index [70.13770593890655]
LEANN is a storage-efficient approximate nearest neighbor search index optimized for resource-constrained personal devices.<n>Our evaluation shows that LEANN reduces index size to under 5% of the original raw data, achieving up to 50 times smaller storage than standard indexes.
arXiv Detail & Related papers (2025-06-09T22:43:30Z) - HAKES: Scalable Vector Database for Embedding Search Service [16.034584281180006]
We build a vector database that achieves high throughput and high recall under concurrent read-write workloads.<n>Our index outperforms index baselines in the high recall region and under concurrent read-write workloads.<n>namesys is scalable and achieves up to $16times$ higher throughputs than the baselines.
arXiv Detail & Related papers (2025-05-18T19:26:29Z) - Bang for the Buck: Vector Search on Cloud CPUs [0.0]
We show that CPU microarchitectures available in the cloud perform significantly differently across vector search scenarios.<n>For instance, in an IVF index on float32 vectors, AMD's Zen4 gives almost 3x more queries per second (QPS) compared to Intel's Sapphire Rapids.<n>We hope to guide users in getting the best "bang for the buck" when deploying vector search systems.
arXiv Detail & Related papers (2025-05-12T14:44:21Z) - MINT: Multi-Vector Search Index Tuning [11.309615417231498]
We develop algorithms to find indexes that minimize latency and meet storage and recall constraints.<n>Compared to the baseline, our latency achieves 2.1X to 8.3X speedup.
arXiv Detail & Related papers (2025-04-28T17:36:06Z) - SPFresh: Incremental In-Place Update for Billion-Scale Vector Search [19.245438083030006]
We introduce SPFresh, a system that supports in-place vector updates.
At the heart of SPFresh is LIRE, a lightweight incremental rebalancing protocol.
With LIRE, SPFresh provides superior query latency and accuracy to solutions based on global rebuild.
arXiv Detail & Related papers (2024-10-18T13:24:18Z) - Operational Advice for Dense and Sparse Retrievers: HNSW, Flat, or Inverted Indexes? [62.57689536630933]
We provide experimental results on the BEIR dataset using the open-source Lucene search library.
Our results provide guidance for today's search practitioner in understanding the design space of dense and sparse retrievers.
arXiv Detail & Related papers (2024-09-10T12:46:23Z) - Locally-Adaptive Quantization for Streaming Vector Search [1.151101202055732]
Locally-Adaptive Vector Quantization (LVQ), a highly efficient vector compression method, yields state-of-the-art search performance for non-evolving databases.
We introduce two improvements of LVQ: Turbo LVQ and multi-means LVQ that boost its search performance by up to 28% and 27%.
Our studies show that LVQ and its new variants enable blazing fast vector search, outperforming its closest competitor by up to 9.4x for identically distributed data.
arXiv Detail & Related papers (2024-02-03T05:43:39Z) - The Faiss library [54.589857872477445]
Faiss is a toolkit of indexing methods and related primitives used to search, cluster, compress and transform vectors.<n>This paper describes the trade-off space of vector search and the design principles of Faiss in terms of structure, approach to optimization and interfacing.
arXiv Detail & Related papers (2024-01-16T11:12:36Z) - Curator: Efficient Indexing for Multi-Tenant Vector Databases [13.54470338115846]
This paper presents Curator, an in-memory vector index design tailored for multi-tenant queries.
Curator indexes each tenant's vectors with a tenant-specific clustering tree and encodes these trees compactly as sub-trees of a shared clustering tree.
Our evaluation, based on two widely used data sets, confirms that Curator delivers search performance on par with per-tenant indexing.
arXiv Detail & Related papers (2024-01-13T17:08:09Z) - LeanVec: Searching vectors faster by making them fit [1.0863382547662974]
We present LeanVec, a framework that combines linear dimensionality reduction with vector quantization to accelerate similarity search on high-dimensional vectors.
We show that LeanVec produces state-of-the-art results, with up to 3.7x improvement in search throughput and up to 4.9x faster index build time.
arXiv Detail & Related papers (2023-12-26T21:14:59Z) - Injecting Domain Adaptation with Learning-to-hash for Effective and
Efficient Zero-shot Dense Retrieval [49.98615945702959]
We evaluate LTH and vector compression techniques for improving the downstream zero-shot retrieval accuracy of the TAS-B dense retriever.
Our results demonstrate that, unlike prior work, LTH strategies when applied naively can underperform the zero-shot TAS-B dense retriever on average by up to 14% nDCG@10.
arXiv Detail & Related papers (2022-05-23T17:53:44Z) - IRLI: Iterative Re-partitioning for Learning to Index [104.72641345738425]
Methods have to trade between obtaining high accuracy while maintaining load balance and scalability in distributed settings.
We propose a novel approach called IRLI, which iteratively partitions the items by learning the relevant buckets directly from the query-item relevance data.
We mathematically show that IRLI retrieves the correct item with high probability under very natural assumptions and provides superior load balancing.
arXiv Detail & Related papers (2021-03-17T23:13:25Z) - The Case for Learned Spatial Indexes [62.88514422115702]
We use techniques proposed from a state-of-the art learned multi-dimensional index structure (namely, Flood) to answer spatial range queries.
We show that (i) machine learned search within a partition is faster by 11.79% to 39.51% than binary search when using filtering on one dimension.
We also refine using machine learned indexes is 1.23x to 1.83x times faster than closest competitor which filters on two dimensions.
arXiv Detail & Related papers (2020-08-24T12:09:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.