On Storage Neural Network Augmented Approximate Nearest Neighbor Search
- URL: http://arxiv.org/abs/2501.16375v1
- Date: Thu, 23 Jan 2025 06:56:18 GMT
- Title: On Storage Neural Network Augmented Approximate Nearest Neighbor Search
- Authors: Taiga Ikeda, Daisuke Miyashita, Jun Deguchi,
- Abstract summary: It is necessary to search for the most similar vectors to a given query vector from the data stored in storage devices, not from that in memory.
In this paper, we propose a method to predict the correct clusters by means of a neural network.
Compared to state-of-the-art SPANN and an exhaustive method using k-means clustering and linear search, the proposed method achieves 90% recall on SIFT1M with 80% and 58% less data fetched from storage, respectively.
- Score: 1.3654846342364308
- License:
- Abstract: Large-scale approximate nearest neighbor search (ANN) has been gaining attention along with the latest machine learning researches employing ANNs. If the data is too large to fit in memory, it is necessary to search for the most similar vectors to a given query vector from the data stored in storage devices, not from that in memory. The storage device such as NAND flash memory has larger capacity than the memory device such as DRAM, but they also have larger latency to read data. Therefore, ANN methods for storage require completely different approaches from conventional in-memory ANN methods. Since the approximation that the time required for search is determined only by the amount of data fetched from storage holds under reasonable assumptions, our goal is to minimize it while maximizing recall. For partitioning-based ANNs, vectors are partitioned into clusters in the index building phase. In the search phase, some of the clusters are chosen, the vectors in the chosen clusters are fetched from storage, and the nearest vector is retrieved from the fetched vectors. Thus, the key point is to accurately select the clusters containing the ground truth nearest neighbor vectors. We accomplish this by proposing a method to predict the correct clusters by means of a neural network that is gradually refined by alternating supervised learning and duplicated cluster assignment. Compared to state-of-the-art SPANN and an exhaustive method using k-means clustering and linear search, the proposed method achieves 90% recall on SIFT1M with 80% and 58% less data fetched from storage, respectively.
Related papers
- On Simplifying Large-Scale Spatial Vectors: Fast, Memory-Efficient, and Cost-Predictable k-means [7.192072206801592]
The k-means algorithm can simplify large-scale spatial vectors, such as 2D geo-locations and 3D point clouds, to support fast analytics and learning.
Existing k-means algorithms achieve high performance with significant computational resources, such as memory and CPU usage time.
We propose a fast, memory-efficient, and cost-predictable k-means called Dask-means.
arXiv Detail & Related papers (2024-12-03T08:16:59Z) - RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval [24.472784635757016]
RetrievalAttention is a training-free approach to both accelerate attention computation and reduce GPU memory consumption.
We show that RetrievalAttention achieves near full attention accuracy while only requiring access to 1--3% of the data.
arXiv Detail & Related papers (2024-09-16T17:59:52Z) - AiSAQ: All-in-Storage ANNS with Product Quantization for DRAM-free Information Retrieval [1.099532646524593]
DiskANN achieves good recall-speed balance for large-scale datasets using both of RAM and storage.
Despite it claims to save memory usage by loading compressed vectors by product quantization (PQ), its memory usage increases in proportion to the scale of datasets.
We propose All-in-Storage ANNS with Product Quantization (AiSAQ), which offloads the compressed vectors to storage.
arXiv Detail & Related papers (2024-04-09T04:20:27Z) - Large-scale Fully-Unsupervised Re-Identification [78.47108158030213]
We propose two strategies to learn from large-scale unlabeled data.
The first strategy performs a local neighborhood sampling to reduce the dataset size in each without violating neighborhood relationships.
A second strategy leverages a novel Re-Ranking technique, which has a lower time upper bound complexity and reduces the memory complexity from O(n2) to O(kn) with k n.
arXiv Detail & Related papers (2023-07-26T16:19:19Z) - Data Selection for Language Models via Importance Resampling [90.9263039747723]
We formalize the problem of selecting a subset of a large raw unlabeled dataset to match a desired target distribution.
We extend the classic importance resampling approach used in low-dimensions for LM data selection.
We instantiate the DSIR framework with hashed n-gram features for efficiency, enabling the selection of 100M documents in 4.5 hours.
arXiv Detail & Related papers (2023-02-06T23:57:56Z) - Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel.
Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU.
Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z) - SreaMRAK a Streaming Multi-Resolution Adaptive Kernel Algorithm [60.61943386819384]
Existing implementations of KRR require that all the data is stored in the main memory.
We propose StreaMRAK - a streaming version of KRR.
We present a showcase study on two synthetic problems and the prediction of the trajectory of a double pendulum.
arXiv Detail & Related papers (2021-08-23T21:03:09Z) - IRLI: Iterative Re-partitioning for Learning to Index [104.72641345738425]
Methods have to trade between obtaining high accuracy while maintaining load balance and scalability in distributed settings.
We propose a novel approach called IRLI, which iteratively partitions the items by learning the relevant buckets directly from the query-item relevance data.
We mathematically show that IRLI retrieves the correct item with high probability under very natural assumptions and provides superior load balancing.
arXiv Detail & Related papers (2021-03-17T23:13:25Z) - Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge
Computing [113.52575069030192]
Big data, including applications with high security requirements, are often collected and stored on multiple heterogeneous devices, such as mobile devices, drones and vehicles.
Due to the limitations of communication costs and security requirements, it is of paramount importance to extract information in a decentralized manner instead of aggregating data to a fusion center.
We consider the problem of learning model parameters in a multi-agent system with data locally processed via distributed edge nodes.
A class of mini-batch alternating direction method of multipliers (ADMM) algorithms is explored to develop the distributed learning model.
arXiv Detail & Related papers (2020-10-02T10:41:59Z) - SDCOR: Scalable Density-based Clustering for Local Outlier Detection in
Massive-Scale Datasets [0.0]
This paper presents a batch-wise density-based clustering approach for local outlier detection in massive-scale datasets.
Evaluations on real-life and synthetic datasets demonstrate that the proposed method has a low linear time complexity.
arXiv Detail & Related papers (2020-06-13T11:07:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.