Related papers: LEANN: A Low-Storage Vector Index

LEANN: A Low-Storage Vector Index

URL: http://arxiv.org/abs/2506.08276v1
Date: Mon, 09 Jun 2025 22:43:30 GMT
Title: LEANN: A Low-Storage Vector Index
Authors: Yichuan Wang, Shu Liu, Zhifei Li, Yongji Wu, Ziming Mao, Yilong Zhao, Xiao Yan, Zhiying Xu, Yang Zhou, Ion Stoica, Sewon Min, Matei Zaharia, Joseph E. Gonzalez,
Abstract summary: LEANN is a storage-efficient approximate nearest neighbor search index optimized for resource-constrained personal devices.<n>Our evaluation shows that LEANN reduces index size to under 5% of the original raw data, achieving up to 50 times smaller storage than standard indexes.
Score: 70.13770593890655
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Embedding-based search is widely used in applications such as recommendation and retrieval-augmented generation (RAG). Recently, there is a growing demand to support these capabilities over personal data stored locally on devices. However, maintaining the necessary data structure associated with the embedding-based search is often infeasible due to its high storage overhead. For example, indexing 100 GB of raw data requires 150 to 700 GB of storage, making local deployment impractical. Reducing this overhead while maintaining search quality and latency becomes a critical challenge. In this paper, we present LEANN, a storage-efficient approximate nearest neighbor (ANN) search index optimized for resource-constrained personal devices. LEANN combines a compact graph-based structure with an efficient on-the-fly recomputation strategy to enable fast and accurate retrieval with minimal storage overhead. Our evaluation shows that LEANN reduces index size to under 5% of the original raw data, achieving up to 50 times smaller storage than standard indexes, while maintaining 90% top-3 recall in under 2 seconds on real-world question answering benchmarks.

Related papers

Infini-gram mini: Exact n-gram Search at the Internet Scale with FM-Index [124.68209298883296]
We present Infini-gram mini, a scalable system that can make petabyte-level text corpora searchable.<n>We index 46TB of Internet text in 50 days with a single 128-core CPU node.<n>We show one important use case of Infini-gram mini in a large-scale analysis of benchmark contamination.
arXiv Detail & Related papers (2025-06-13T21:13:57Z)
Half Search Space is All You Need [18.672184596814077]
We propose to prune the search space in an efficient automatic manner to reduce memory consumption and search time.<n>Specifically, we utilise Zero-Shot NAS to efficiently remove low-performing architectures from the search space.<n> Experimental results on the DARTS search space show that our approach reduces memory consumption by 81% compared to the baseline One-Shot setup.
arXiv Detail & Related papers (2025-05-19T17:59:59Z)
HAKES: Scalable Vector Database for Embedding Search Service [16.034584281180006]
We build a vector database that achieves high throughput and high recall under concurrent read-write workloads.<n>Our index outperforms index baselines in the high recall region and under concurrent read-write workloads.<n>namesys is scalable and achieves up to $16times$ higher throughputs than the baselines.
arXiv Detail & Related papers (2025-05-18T19:26:29Z)
On Storage Neural Network Augmented Approximate Nearest Neighbor Search [1.3654846342364308]
It is necessary to search for the most similar vectors to a given query vector from the data stored in storage devices, not from that in memory.<n>In this paper, we propose a method to predict the correct clusters by means of a neural network.<n>Compared to state-of-the-art SPANN and an exhaustive method using k-means clustering and linear search, the proposed method achieves 90% recall on SIFT1M with 80% and 58% less data fetched from storage, respectively.
arXiv Detail & Related papers (2025-01-23T06:56:18Z)
Efficient Long Context Language Model Retrieval with Compression [57.09163579304332]
Long Context Language Models (LCLMs) have emerged as a new paradigm to perform Information Retrieval (IR)<n>We propose a new compression approach tailored for LCLM retrieval, which is trained to maximize the retrieval performance while minimizing the length of the compressed passages.<n>We show that CoLoR improves the retrieval performance by 6% while compressing the in-context size by a factor of 1.91.
arXiv Detail & Related papers (2024-12-24T07:30:55Z)
Semi-Parametric Retrieval via Binary Bag-of-Tokens Index [71.78109794895065]
SemI-parametric Disentangled Retrieval (SiDR) is a bi-encoder retrieval framework that decouples retrieval index from neural parameters.<n>SiDR supports a non-parametric tokenization index for search, achieving BM25-like indexing complexity with significantly better effectiveness.
arXiv Detail & Related papers (2024-05-03T08:34:13Z)
AiSAQ: All-in-Storage ANNS with Product Quantization for DRAM-free Information Retrieval [1.099532646524593]
We propose All-in-Storage ANNS with Product Quantization (AiSAQ), which offloads compressed vectors to the SSD index.<n>Our method achieves $sim$10 MB memory usage in query search with billion-scale datasets without critical latency degradation.
arXiv Detail & Related papers (2024-04-09T04:20:27Z)
Injecting Domain Adaptation with Learning-to-hash for Effective and Efficient Zero-shot Dense Retrieval [49.98615945702959]
We evaluate LTH and vector compression techniques for improving the downstream zero-shot retrieval accuracy of the TAS-B dense retriever. Our results demonstrate that, unlike prior work, LTH strategies when applied naively can underperform the zero-shot TAS-B dense retriever on average by up to 14% nDCG@10.
arXiv Detail & Related papers (2022-05-23T17:53:44Z)
Memory-Efficient Hierarchical Neural Architecture Search for Image Restoration [68.6505473346005]
We propose a memory-efficient hierarchical NAS HiNAS (HiNAS) for image denoising and image super-resolution tasks. With a single GTX1080Ti GPU, it takes only about 1 hour for searching for denoising network on BSD 500 and 3.5 hours for searching for the super-resolution structure on DIV2K.
arXiv Detail & Related papers (2020-12-24T12:06:17Z)
The Case for Learned Spatial Indexes [62.88514422115702]
We use techniques proposed from a state-of-the art learned multi-dimensional index structure (namely, Flood) to answer spatial range queries. We show that (i) machine learned search within a partition is faster by 11.79% to 39.51% than binary search when using filtering on one dimension. We also refine using machine learned indexes is 1.23x to 1.83x times faster than closest competitor which filters on two dimensions.
arXiv Detail & Related papers (2020-08-24T12:09:55Z)
Hands-off Model Integration in Spatial Index Structures [8.710716183434918]
In this paper we explore the opportunity to use light-weight machine learning models to accelerate queries on spatial indexes. We do so by exploring the potential of using and similar techniques on the R-tree, arguably the most broadly used spatial index. As we show in our analysis, the query execution time can be reduced by up to 60% while simultaneously shrinking the index's memory footprint by over 90%.
arXiv Detail & Related papers (2020-06-29T22:05:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.