Lossless Compression of Vector IDs for Approximate Nearest Neighbor Search
- URL: http://arxiv.org/abs/2501.10479v1
- Date: Thu, 16 Jan 2025 20:45:11 GMT
- Title: Lossless Compression of Vector IDs for Approximate Nearest Neighbor Search
- Authors: Daniel Severo, Giuseppe Ottaviano, Matthew Muckley, Karen Ullrich, Matthijs Douze,
- Abstract summary: Lossy compression has been applied extensively to reduce the size of indexes.<n>For inverted file and graph-based indices, auxiliary data such as vector ids and links can represent most of the storage cost.<n>We show that for some datasets, these methods can also compress the quantized vector codes losslessly.
- Score: 11.938555573590964
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Approximate nearest neighbor search for vectors relies on indexes that are most often accessed from RAM. Therefore, storage is the factor limiting the size of the database that can be served from a machine. Lossy vector compression, i.e., embedding quantization, has been applied extensively to reduce the size of indexes. However, for inverted file and graph-based indices, auxiliary data such as vector ids and links (edges) can represent most of the storage cost. We introduce and evaluate lossless compression schemes for these cases. These approaches are based on asymmetric numeral systems or wavelet trees that exploit the fact that the ordering of ids is irrelevant within the data structures. In some settings, we are able to compress the vector ids by a factor 7, with no impact on accuracy or search runtime. On billion-scale datasets, this results in a reduction of 30% of the index size. Furthermore, we show that for some datasets, these methods can also compress the quantized vector codes losslessly, by exploiting sub-optimalities in the original quantization algorithm. The source code for our approach available at https://github.com/facebookresearch/vector_db_id_compression.
Related papers
- Industrial-Scale Neural Network Clone Detection with Disk-Based Similarity Search [0.24091079613649843]
Code clones are similar code fragments that often arise from copy-and-paste programming.
We extend existing neural network-based clone detection schemes to handle clones that far exceed available memory.
We demonstrate that our approach is around 2$times$ slower than the in-memory approach for a problem size that can fit within memory.
arXiv Detail & Related papers (2025-04-24T22:50:23Z) - PDX: A Data Layout for Vector Similarity Search [0.0]
Partition Across Dimensions (PDX) is a data layout for vectors that stores multiple vectors in one block, using a vertical layout for the dimensions.
PDX beats SIMD-optimized distance kernels on standard horizontal vector storage (avg 40% faster)
We introduce PDX-BOND, an even more flexible dimension-pruning strategy, with good performance on exact search and reasonable performance on approximate search.
arXiv Detail & Related papers (2025-03-06T13:31:16Z) - Generalized compression and compressive search of large datasets [0.0]
panCAKES is a novel approach to compressive search, i.e., a way to perform $k$-NN and $rho$-NN search on compressed data.<n> panCAKES assumes the manifold hypothesis and leverages the low-dimensional structure of the data to compress and search it efficiently.<n>We benchmark panCAKES on a variety of datasets, including genomic, proteomic, and set data.
arXiv Detail & Related papers (2024-09-18T17:25:31Z) - Semi-Parametric Retrieval via Binary Token Index [71.78109794895065]
Semi-parametric Vocabulary Disentangled Retrieval (SVDR) is a novel semi-parametric retrieval framework.
It supports two types of indexes: an embedding-based index for high effectiveness, akin to existing neural retrieval methods; and a binary token index that allows for quick and cost-effective setup, resembling traditional term-based retrieval.
It achieves a 3% higher top-1 retrieval accuracy compared to the dense retriever DPR when using an embedding-based index and a 9% higher top-1 accuracy compared to BM25 when using a binary token index.
arXiv Detail & Related papers (2024-05-03T08:34:13Z) - Compression of Structured Data with Autoencoders: Provable Benefit of
Nonlinearities and Depth [83.15263499262824]
We prove that gradient descent converges to a solution that completely disregards the sparse structure of the input.
We show how to improve upon Gaussian performance for the compression of sparse data by adding a denoising function to a shallow architecture.
We validate our findings on image datasets, such as CIFAR-10 and MNIST.
arXiv Detail & Related papers (2024-02-07T16:32:29Z) - The Faiss library [54.589857872477445]
Faiss is a toolkit of indexing methods and related primitives used to search, cluster, compress and transform vectors.
This paper describes the trade-off space of vector search and the design principles of Faiss in terms of structure, approach to optimization and interfacing.
arXiv Detail & Related papers (2024-01-16T11:12:36Z) - Similarity search in the blink of an eye with compressed indices [3.39271933237479]
Graph-based indices are currently the best performing techniques for billion-scale similarity search.
We present new techniques and systems for creating faster and smaller graph-based indices.
arXiv Detail & Related papers (2023-04-07T23:10:39Z) - Nearest neighbor search with compact codes: A decoder perspective [77.60612610421101]
We re-interpret popular methods such as binary hashing or product quantizers as auto-encoders.
We design backward-compatible decoders that improve the reconstruction of the vectors from the same codes.
arXiv Detail & Related papers (2021-12-17T15:22:28Z) - Using Convolutional Neural Networks to Detect Compression Algorithms [0.0]
We use a base dataset, compressed every file with various algorithms, and designed a model based on that.
The used model was accurately able to identify files compressed using compress, lzip and bzip2.
arXiv Detail & Related papers (2021-11-17T11:03:16Z) - Partition and Code: learning how to compress graphs [50.29024357495154]
"Partition and Code" framework entails three steps: first, a partitioning algorithm decomposes the graph into elementary structures, then these are mapped to the elements of a small dictionary on which we learn a probability distribution, and finally, an entropy encoder translates the representation into bits.
Our algorithms are quantitatively evaluated on diverse real-world networks obtaining significant performance improvements with respect to different families of non-parametric and parametric graph compressor.
arXiv Detail & Related papers (2021-07-05T11:41:16Z) - Permute, Quantize, and Fine-tune: Efficient Compression of Neural
Networks [70.0243910593064]
Key to success of vector quantization is deciding which parameter groups should be compressed together.
In this paper we make the observation that the weights of two adjacent layers can be permuted while expressing the same function.
We then establish a connection to rate-distortion theory and search for permutations that result in networks that are easier to compress.
arXiv Detail & Related papers (2020-10-29T15:47:26Z) - OctSqueeze: Octree-Structured Entropy Model for LiDAR Compression [77.8842824702423]
We present a novel deep compression algorithm to reduce the memory footprint of LiDAR point clouds.
Our method exploits the sparsity and structural redundancy between points to reduce the memory footprint.
Our algorithm can be used to reduce the onboard and offboard storage of LiDAR points for applications such as self-driving cars.
arXiv Detail & Related papers (2020-05-14T17:48:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.