MicroNN: An On-device Disk-resident Updatable Vector Database
- URL: http://arxiv.org/abs/2504.05573v1
- Date: Tue, 08 Apr 2025 00:05:58 GMT
- Title: MicroNN: An On-device Disk-resident Updatable Vector Database
- Authors: Jeffrey Pound, Floris Chabert, Arjun Bhushan, Ankur Goswami, Anil Pacaci, Shihabur Rahman Chowdhury,
- Abstract summary: Micro Nearest Neighbour (MicroNN) is an embedded nearest-neighbour vector search engine for scalable similarity search in low-resource environments.<n>MicroNN addresses the problem of on-device vector search for real-world workloads containing updates and hybrid search queries.<n>MicroNN takes less than 7 ms to retrieve the top-100 nearest neighbours with 90% recall on publicly available million-scale vector benchmark.
- Score: 2.414259539583284
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Nearest neighbour search over dense vector collections has important applications in information retrieval, retrieval augmented generation (RAG), and content ranking. Performing efficient search over large vector collections is a well studied problem with many existing approaches and open source implementations. However, most state-of-the-art systems are generally targeted towards scenarios using large servers with an abundance of memory, static vector collections that are not updatable, and nearest neighbour search in isolation of other search criteria. We present Micro Nearest Neighbour (MicroNN), an embedded nearest-neighbour vector search engine designed for scalable similarity search in low-resource environments. MicroNN addresses the problem of on-device vector search for real-world workloads containing updates and hybrid search queries that combine nearest neighbour search with structured attribute filters. In this scenario, memory is highly constrained and disk-efficient index structures and algorithms are required, as well as support for continuous inserts and deletes. MicroNN is an embeddable library that can scale to large vector collections with minimal resources. MicroNN is used in production and powers a wide range of vector search use-cases on-device. MicroNN takes less than 7 ms to retrieve the top-100 nearest neighbours with 90% recall on publicly available million-scale vector benchmark while using ~10 MB of memory.
Related papers
- On Storage Neural Network Augmented Approximate Nearest Neighbor Search [1.3654846342364308]
It is necessary to search for the most similar vectors to a given query vector from the data stored in storage devices, not from that in memory.<n>In this paper, we propose a method to predict the correct clusters by means of a neural network.<n>Compared to state-of-the-art SPANN and an exhaustive method using k-means clustering and linear search, the proposed method achieves 90% recall on SIFT1M with 80% and 58% less data fetched from storage, respectively.
arXiv Detail & Related papers (2025-01-23T06:56:18Z) - SPFresh: Incremental In-Place Update for Billion-Scale Vector Search [19.245438083030006]
We introduce SPFresh, a system that supports in-place vector updates.
At the heart of SPFresh is LIRE, a lightweight incremental rebalancing protocol.
With LIRE, SPFresh provides superior query latency and accuracy to solutions based on global rebuild.
arXiv Detail & Related papers (2024-10-18T13:24:18Z) - RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval [24.472784635757016]
RetrievalAttention is a training-free approach to both accelerate attention computation and reduce GPU memory consumption.<n>We show that RetrievalAttention achieves near full attention accuracy while only requiring access to 1--3% of the data.
arXiv Detail & Related papers (2024-09-16T17:59:52Z) - MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings [15.275864151890511]
We introduce MUVERA (MUlti-VEctor Retrieval Algorithm), a retrieval mechanism which reduces multi-vector search to single-vector similarity search.
MUVERA achieves consistently good end-to-end recall and latency across a diverse set of the BEIR retrieval datasets.
arXiv Detail & Related papers (2024-05-29T20:40:20Z) - The Impacts of Data, Ordering, and Intrinsic Dimensionality on Recall in Hierarchical Navigable Small Worlds [0.09208007322096533]
Investigation focuses on HNSW's efficacy across a spectrum of datasets.
We discover that the recall of approximate HNSW search, in comparison to exact K Nearest Neighbours (KNN) search, is linked to the vector space's intrinsic dimensionality.
We observe that running popular benchmark datasets with HNSW instead of KNN can shift rankings by up to three positions for some models.
arXiv Detail & Related papers (2024-05-28T04:16:43Z) - Vector search with small radiuses [10.880913075221361]
This paper focuses on the common case where a hard decision needs to be taken depending on the vector retrieval results.
We show that the value of a range search result can be modeled rigorously based on the query-to-vector distance.
This yields a metric for range search, RSM, that is both principled and easy to compute without running an end-to-end evaluation.
arXiv Detail & Related papers (2024-03-16T00:34:25Z) - Multimodal Learned Sparse Retrieval with Probabilistic Expansion Control [66.78146440275093]
Learned retrieval (LSR) is a family of neural methods that encode queries and documents into sparse lexical vectors.
We explore the application of LSR to the multi-modal domain, with a focus on text-image retrieval.
Current approaches like LexLIP and STAIR require complex multi-step training on massive datasets.
Our proposed approach efficiently transforms dense vectors from a frozen dense model into sparse lexical vectors.
arXiv Detail & Related papers (2024-02-27T14:21:56Z) - Efficient k-NN Search with Cross-Encoders using Adaptive Multi-Round CUR
Decomposition [77.4863142882136]
Cross-encoder models are prohibitively expensive for direct k-nearest neighbor (k-NN) search.
We propose ADACUR, a method that adaptively, iteratively, and efficiently minimizes the approximation error for the practically important top-k neighbors.
arXiv Detail & Related papers (2023-05-04T17:01:17Z) - BR-NS: an Archive-less Approach to Novelty Search [70.13948372218849]
We discuss an alternative approach to novelty estimation, dubbed Behavior Recognition based Novelty Search (BR-NS)
BR-NS does not require an archive, makes no assumption on the metrics that can be defined in the behavior space and does not rely on nearest neighbours search.
We conduct experiments to gain insight into its feasibility and dynamics as well as potential advantages over archive-based NS in terms of time complexity.
arXiv Detail & Related papers (2021-04-08T17:31:34Z) - SOLAR: Sparse Orthogonal Learned and Random Embeddings [45.920844071257754]
We argue that high-dimensional and ultra-sparse embedding is a significantly superior alternative to dense low-dimensional embedding for both query efficiency and accuracy.
We train 500K dimensional SOLAR embeddings for the tasks of searching through 1.6M books and multi-label classification on the three largest public datasets.
We achieve superior precision and recall compared to the respective state-of-the-art baselines for each of the tasks with up to 10 times faster speed.
arXiv Detail & Related papers (2020-08-30T17:35:35Z) - The Case for Learned Spatial Indexes [62.88514422115702]
We use techniques proposed from a state-of-the art learned multi-dimensional index structure (namely, Flood) to answer spatial range queries.
We show that (i) machine learned search within a partition is faster by 11.79% to 39.51% than binary search when using filtering on one dimension.
We also refine using machine learned indexes is 1.23x to 1.83x times faster than closest competitor which filters on two dimensions.
arXiv Detail & Related papers (2020-08-24T12:09:55Z) - Searching towards Class-Aware Generators for Conditional Generative
Adversarial Networks [132.29772160843825]
Conditional Generative Adversarial Networks (cGAN) were designed to generate images based on the provided conditions.
Existing methods have used the same generating architecture for all classes.
This paper presents a novel idea that adopts NAS to find a distinct architecture for each class.
arXiv Detail & Related papers (2020-06-25T07:05:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.