Bang for the Buck: Vector Search on Cloud CPUs
- URL: http://arxiv.org/abs/2505.07621v1
- Date: Mon, 12 May 2025 14:44:21 GMT
- Title: Bang for the Buck: Vector Search on Cloud CPUs
- Authors: Leonardo Kuffo, Peter Boncz,
- Abstract summary: We show that CPU microarchitectures available in the cloud perform significantly differently across vector search scenarios.<n>For instance, in an IVF index on float32 vectors, AMD's Zen4 gives almost 3x more queries per second (QPS) compared to Intel's Sapphire Rapids.<n>We hope to guide users in getting the best "bang for the buck" when deploying vector search systems.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Vector databases have emerged as a new type of systems that support efficient querying of high-dimensional vectors. Many of these offer their database as a service in the cloud. However, the variety of available CPUs and the lack of vector search benchmarks across CPUs make it difficult for users to choose one. In this study, we show that CPU microarchitectures available in the cloud perform significantly differently across vector search scenarios. For instance, in an IVF index on float32 vectors, AMD's Zen4 gives almost 3x more queries per second (QPS) compared to Intel's Sapphire Rapids, but for HNSW indexes, the tables turn. However, when looking at the number of queries per dollar (QP$), Graviton3 is the best option for most indexes and quantization settings, even over Graviton4 (Table 1). With this work, we hope to guide users in getting the best "bang for the buck" when deploying vector search systems.
Related papers
- KBest: Efficient Vector Search on Kunpeng CPU [21.419014075922657]
KBest is a vector search library tailored for the latest Huawei Kunpeng 920 CPUs.<n>To be efficient, KBest incorporates extensive hardware-aware and algorithmic optimizations.<n>Experiment results show that KBest outperforms SOTA vector search libraries running on x86 CPUs.
arXiv Detail & Related papers (2025-08-05T02:52:15Z) - HAKES: Scalable Vector Database for Embedding Search Service [16.034584281180006]
We build a vector database that achieves high throughput and high recall under concurrent read-write workloads.<n>Our index outperforms index baselines in the high recall region and under concurrent read-write workloads.<n>namesys is scalable and achieves up to $16times$ higher throughputs than the baselines.
arXiv Detail & Related papers (2025-05-18T19:26:29Z) - Cost-Effective, Low Latency Vector Search with Azure Cosmos DB [14.766278554685776]
We argue that a scalable, high-performance, and cost-efficient vector search system can be built inside a cloud-native operational database like Azure Cosmos DB.<n>This system uses a single vector index per partition stored in existing index trees, and kept in sync with underlying data.<n>It supports 20ms query latency over an index spanning 10 million of vectors, has stable recall over updates, and offers nearly 15x and 41x lower query cost compared to Disk and Pinecone serverless enterprise products.
arXiv Detail & Related papers (2025-05-09T08:53:59Z) - MINT: Multi-Vector Search Index Tuning [11.309615417231498]
We develop algorithms to find indexes that minimize latency and meet storage and recall constraints.<n>Compared to the baseline, our latency achieves 2.1X to 8.3X speedup.
arXiv Detail & Related papers (2025-04-28T17:36:06Z) - Lossless Compression of Vector IDs for Approximate Nearest Neighbor Search [11.938555573590964]
Lossy compression has been applied extensively to reduce the size of indexes.<n>For inverted file and graph-based indices, auxiliary data such as vector ids and links can represent most of the storage cost.<n>We show that for some datasets, these methods can also compress the quantized vector codes losslessly.
arXiv Detail & Related papers (2025-01-16T20:45:11Z) - Operational Advice for Dense and Sparse Retrievers: HNSW, Flat, or Inverted Indexes? [62.57689536630933]
We provide experimental results on the BEIR dataset using the open-source Lucene search library.
Our results provide guidance for today's search practitioner in understanding the design space of dense and sparse retrievers.
arXiv Detail & Related papers (2024-09-10T12:46:23Z) - Locally-Adaptive Quantization for Streaming Vector Search [1.151101202055732]
Locally-Adaptive Vector Quantization (LVQ), a highly efficient vector compression method, yields state-of-the-art search performance for non-evolving databases.
We introduce two improvements of LVQ: Turbo LVQ and multi-means LVQ that boost its search performance by up to 28% and 27%.
Our studies show that LVQ and its new variants enable blazing fast vector search, outperforming its closest competitor by up to 9.4x for identically distributed data.
arXiv Detail & Related papers (2024-02-03T05:43:39Z) - The Faiss library [54.589857872477445]
Faiss is a toolkit of indexing methods and related primitives used to search, cluster, compress and transform vectors.<n>This paper describes the trade-off space of vector search and the design principles of Faiss in terms of structure, approach to optimization and interfacing.
arXiv Detail & Related papers (2024-01-16T11:12:36Z) - LeanVec: Searching vectors faster by making them fit [1.0863382547662974]
We present LeanVec, a framework that combines linear dimensionality reduction with vector quantization to accelerate similarity search on high-dimensional vectors.
We show that LeanVec produces state-of-the-art results, with up to 3.7x improvement in search throughput and up to 4.9x faster index build time.
arXiv Detail & Related papers (2023-12-26T21:14:59Z) - INR-Arch: A Dataflow Architecture and Compiler for Arbitrary-Order
Gradient Computations in Implicit Neural Representation Processing [66.00729477511219]
Given a function represented as a computation graph, traditional architectures face challenges in efficiently computing its nth-order gradient.
We introduce INR-Arch, a framework that transforms the computation graph of an nth-order gradient into a hardware-optimized dataflow architecture.
We present results that demonstrate 1.8-4.8x and 1.5-3.6x speedup compared to CPU and GPU baselines respectively.
arXiv Detail & Related papers (2023-08-11T04:24:39Z) - PLSSVM: A (multi-)GPGPU-accelerated Least Squares Support Vector Machine [68.8204255655161]
Support Vector Machines (SVMs) are widely used in machine learning.
However, even modern and optimized implementations do not scale well for large non-trivial dense data sets on cutting-edge hardware.
PLSSVM can be used as a drop-in replacement for an LVM.
arXiv Detail & Related papers (2022-02-25T13:24:23Z) - IRLI: Iterative Re-partitioning for Learning to Index [104.72641345738425]
Methods have to trade between obtaining high accuracy while maintaining load balance and scalability in distributed settings.
We propose a novel approach called IRLI, which iteratively partitions the items by learning the relevant buckets directly from the query-item relevance data.
We mathematically show that IRLI retrieves the correct item with high probability under very natural assumptions and provides superior load balancing.
arXiv Detail & Related papers (2021-03-17T23:13:25Z) - The Case for Learned Spatial Indexes [62.88514422115702]
We use techniques proposed from a state-of-the art learned multi-dimensional index structure (namely, Flood) to answer spatial range queries.
We show that (i) machine learned search within a partition is faster by 11.79% to 39.51% than binary search when using filtering on one dimension.
We also refine using machine learned indexes is 1.23x to 1.83x times faster than closest competitor which filters on two dimensions.
arXiv Detail & Related papers (2020-08-24T12:09:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.