Related papers: On Storage Neural Network Augmented Approximate Nearest Neighbor Search

On Storage Neural Network Augmented Approximate Nearest Neighbor Search

URL: http://arxiv.org/abs/2501.16375v1
Date: Thu, 23 Jan 2025 06:56:18 GMT
Title: On Storage Neural Network Augmented Approximate Nearest Neighbor Search
Authors: Taiga Ikeda, Daisuke Miyashita, Jun Deguchi,
Abstract summary: It is necessary to search for the most similar vectors to a given query vector from the data stored in storage devices, not from that in memory.<n>In this paper, we propose a method to predict the correct clusters by means of a neural network.<n>Compared to state-of-the-art SPANN and an exhaustive method using k-means clustering and linear search, the proposed method achieves 90% recall on SIFT1M with 80% and 58% less data fetched from storage, respectively.
Score: 1.3654846342364308
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large-scale approximate nearest neighbor search (ANN) has been gaining attention along with the latest machine learning researches employing ANNs. If the data is too large to fit in memory, it is necessary to search for the most similar vectors to a given query vector from the data stored in storage devices, not from that in memory. The storage device such as NAND flash memory has larger capacity than the memory device such as DRAM, but they also have larger latency to read data. Therefore, ANN methods for storage require completely different approaches from conventional in-memory ANN methods. Since the approximation that the time required for search is determined only by the amount of data fetched from storage holds under reasonable assumptions, our goal is to minimize it while maximizing recall. For partitioning-based ANNs, vectors are partitioned into clusters in the index building phase. In the search phase, some of the clusters are chosen, the vectors in the chosen clusters are fetched from storage, and the nearest vector is retrieved from the fetched vectors. Thus, the key point is to accurately select the clusters containing the ground truth nearest neighbor vectors. We accomplish this by proposing a method to predict the correct clusters by means of a neural network that is gradually refined by alternating supervised learning and duplicated cluster assignment. Compared to state-of-the-art SPANN and an exhaustive method using k-means clustering and linear search, the proposed method achieves 90% recall on SIFT1M with 80% and 58% less data fetched from storage, respectively.

Related papers

LEANN: A Low-Storage Vector Index [70.13770593890655]
LEANN is a storage-efficient approximate nearest neighbor search index optimized for resource-constrained personal devices.<n>Our evaluation shows that LEANN reduces index size to under 5% of the original raw data, achieving up to 50 times smaller storage than standard indexes.
arXiv Detail & Related papers (2025-06-09T22:43:30Z)
MicroNN: An On-device Disk-resident Updatable Vector Database [2.414259539583284]
Micro Nearest Neighbour (MicroNN) is an embedded nearest-neighbour vector search engine for scalable similarity search in low-resource environments. MicroNN addresses the problem of on-device vector search for real-world workloads containing updates and hybrid search queries. MicroNN takes less than 7 ms to retrieve the top-100 nearest neighbours with 90% recall on publicly available million-scale vector benchmark.
arXiv Detail & Related papers (2025-04-08T00:05:58Z)
On Simplifying Large-Scale Spatial Vectors: Fast, Memory-Efficient, and Cost-Predictable k-means [7.192072206801592]
The k-means algorithm can simplify large-scale spatial vectors, such as 2D geo-locations and 3D point clouds, to support fast analytics and learning. Existing k-means algorithms achieve high performance with significant computational resources, such as memory and CPU usage time. We propose a fast, memory-efficient, and cost-predictable k-means called Dask-means.
arXiv Detail & Related papers (2024-12-03T08:16:59Z)
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval [24.472784635757016]
RetrievalAttention is a training-free approach to both accelerate attention computation and reduce GPU memory consumption.<n>We show that RetrievalAttention achieves near full attention accuracy while only requiring access to 1--3% of the data.
arXiv Detail & Related papers (2024-09-16T17:59:52Z)
CORM: Cache Optimization with Recent Message for Large Language Model Inference [57.109354287786154]
We introduce an innovative method for optimizing the KV cache, which considerably minimizes its memory footprint. CORM, a KV cache eviction policy, dynamically retains essential key-value pairs for inference without the need for model fine-tuning. Our validation shows that CORM reduces the inference memory usage of KV cache by up to 70% with negligible performance degradation across six tasks in LongBench.
arXiv Detail & Related papers (2024-04-24T16:11:54Z)
AiSAQ: All-in-Storage ANNS with Product Quantization for DRAM-free Information Retrieval [1.099532646524593]
DiskANN achieves good recall-speed balance for large-scale datasets using both of RAM and storage. Despite it claims to save memory usage by loading compressed vectors by product quantization (PQ), its memory usage increases in proportion to the scale of datasets. We propose All-in-Storage ANNS with Product Quantization (AiSAQ), which offloads the compressed vectors to storage.
arXiv Detail & Related papers (2024-04-09T04:20:27Z)
Large-scale Fully-Unsupervised Re-Identification [78.47108158030213]
We propose two strategies to learn from large-scale unlabeled data. The first strategy performs a local neighborhood sampling to reduce the dataset size in each without violating neighborhood relationships. A second strategy leverages a novel Re-Ranking technique, which has a lower time upper bound complexity and reduces the memory complexity from O(n2) to O(kn) with k n.
arXiv Detail & Related papers (2023-07-26T16:19:19Z)
Data Selection for Language Models via Importance Resampling [90.9263039747723]
We formalize the problem of selecting a subset of a large raw unlabeled dataset to match a desired target distribution. We extend the classic importance resampling approach used in low-dimensions for LM data selection. We instantiate the DSIR framework with hashed n-gram features for efficiency, enabling the selection of 100M documents in 4.5 hours.
arXiv Detail & Related papers (2023-02-06T23:57:56Z)
Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel. Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU. Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z)
SreaMRAK a Streaming Multi-Resolution Adaptive Kernel Algorithm [60.61943386819384]
Existing implementations of KRR require that all the data is stored in the main memory. We propose StreaMRAK - a streaming version of KRR. We present a showcase study on two synthetic problems and the prediction of the trajectory of a double pendulum.
arXiv Detail & Related papers (2021-08-23T21:03:09Z)
IRLI: Iterative Re-partitioning for Learning to Index [104.72641345738425]
Methods have to trade between obtaining high accuracy while maintaining load balance and scalability in distributed settings. We propose a novel approach called IRLI, which iteratively partitions the items by learning the relevant buckets directly from the query-item relevance data. We mathematically show that IRLI retrieves the correct item with high probability under very natural assumptions and provides superior load balancing.
arXiv Detail & Related papers (2021-03-17T23:13:25Z)
Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge Computing [113.52575069030192]
Big data, including applications with high security requirements, are often collected and stored on multiple heterogeneous devices, such as mobile devices, drones and vehicles. Due to the limitations of communication costs and security requirements, it is of paramount importance to extract information in a decentralized manner instead of aggregating data to a fusion center. We consider the problem of learning model parameters in a multi-agent system with data locally processed via distributed edge nodes. A class of mini-batch alternating direction method of multipliers (ADMM) algorithms is explored to develop the distributed learning model.
arXiv Detail & Related papers (2020-10-02T10:41:59Z)
SDCOR: Scalable Density-based Clustering for Local Outlier Detection in Massive-Scale Datasets [0.0]
This paper presents a batch-wise density-based clustering approach for local outlier detection in massive-scale datasets. Evaluations on real-life and synthetic datasets demonstrate that the proposed method has a low linear time complexity.
arXiv Detail & Related papers (2020-06-13T11:07:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.