PiPNN: Ultra-Scalable Graph-Based Nearest Neighbor Indexing
- URL: http://arxiv.org/abs/2602.21247v1
- Date: Tue, 17 Feb 2026 02:18:17 GMT
- Title: PiPNN: Ultra-Scalable Graph-Based Nearest Neighbor Indexing
- Authors: Tobias Rubel, Richard Wen, Laxman Dhulipala, Lars Gottesbüren, Rajesh Jayaram, Jakub Łącki,
- Abstract summary: PiPNN is an ultra-scalable graph construction algorithm that avoids the search bottleneck'' that existing graph-based methods suffer from.<n>HashPrune enables PiPNN to partition the dataset into overlapping sub-problems, efficiently perform bulk distance comparisons, and stream a subset of the edges into HashPrune.<n>PiPNN builds state-of-the-art indexes up to 11.6x faster than Vamana and up to 12.9x faster than HNSW.
- Score: 6.243421401579046
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The fastest indexes for Approximate Nearest Neighbor Search today are also the slowest to build: graph-based methods like HNSW and Vamana achieve state-of-the-art query performance but have large construction times due to relying on random-access-heavy beam searches. We introduce PiPNN (Pick-in-Partitions Nearest Neighbors), an ultra-scalable graph construction algorithm that avoids this ``search bottleneck'' that existing graph-based methods suffer from. PiPNN's core innovation is HashPrune, a novel online pruning algorithm which dynamically maintains sparse collections of edges. HashPrune enables PiPNN to partition the dataset into overlapping sub-problems, efficiently perform bulk distance comparisons via dense matrix multiplication kernels, and stream a subset of the edges into HashPrune. HashPrune guarantees bounded memory during index construction which permits PiPNN to build higher quality indices without the use of extra intermediate memory. PiPNN builds state-of-the-art indexes up to 11.6x faster than Vamana (DiskANN) and up to 12.9x faster than HNSW. PiPNN is significantly more scalable than recent algorithms for fast graph construction. PiPNN builds indexes at least 19.1x faster than MIRAGE and 17.3x than FastKCNA while producing indexes that achieve higher query throughput. PiPNN enables us to build, for the first time, high-quality ANN indexes on billion-scale datasets in under 20 minutes using a single multicore machine.
Related papers
- Multiple Index Merge for Approximate Nearest Neighbor Search [14.386466486046814]
This paper focuses on efficient two-index merging and the merge order of multiple indexes for AKNN search.<n>We propose a reverse neighbor sliding merge (RNSM) that exploits structural information to boost merging efficiency.<n> Experiments show that our approach yields up to a 5.48$times$ speedup over existing index merge methods and 9.92$times$ speedup over index reconstruction.
arXiv Detail & Related papers (2026-02-19T05:50:34Z) - B+ANN: A Fast Billion-Scale Disk-based Nearest-Neighbor Index [3.4720326275852]
We present a novel disk-based ANN index, B+ANN, to address issues of the HNSW algorithm.<n>It partitions input data into blocks containing semantically similar items, then builds an B+ tree variant to store blocks both in-memory and on disks.<n>It improves both quality (Recall value), and execution performance (Queries per second/QPS) over HNSW.
arXiv Detail & Related papers (2025-11-19T15:50:28Z) - CRouting: Reducing Expensive Distance Calls in Graph-Based Approximate Nearest Neighbor Search [9.557937699715124]
Approximate nearest neighbor search (ANNS) is a crucial problem in information retrieval and AI applications.<n>We propose a novel routing strategy named CRouting, which bypasses unnecessary distance computations.<n>Our experiments show that CRouting reduces the number of distance computations by up to 41.5% and boosts queries per second by up to 1.48$times$ on two predominant graph indexes.
arXiv Detail & Related papers (2025-08-30T05:27:09Z) - In-Place Updates of a Graph Index for Streaming Approximate Nearest Neighbor Search [12.092920351505036]
IP-DiskANN is first algorithm to avoid batch consolidation by efficiently processing each insertion and deletion in-place.<n>It has stable recall over various lengthy update patterns in both high-recall and low-recall regimes.
arXiv Detail & Related papers (2025-02-19T15:41:08Z) - Compact Neural Graphics Primitives with Learned Hash Probing [100.07267906666293]
We show that a hash table with learned probes has neither disadvantage, resulting in a favorable combination of size and speed.
Inference is faster than unprobed hash tables at equal quality while training is only 1.2-2.6x slower.
arXiv Detail & Related papers (2023-12-28T18:58:45Z) - A Theoretical Analysis Of Nearest Neighbor Search On Approximate Near
Neighbor Graph [51.880164098926166]
Graph-based algorithms have demonstrated state-of-the-art performance in the nearest neighbor search (NN-Search) problem.
There exists a practice-to-theory gap in the graph-based NN-Search algorithms.
We present theoretical guarantees of solving NN-Search via greedy search on ANN-Graph for low dimensional and dense vectors.
arXiv Detail & Related papers (2023-03-10T21:18:34Z) - Accelerating Barnes-Hut t-SNE Algorithm by Efficient Parallelization on
Multi-Core CPUs [59.18990342943095]
t-SNE remains one of the most popular embedding techniques for visualizing high-dimensional data.
BH t-SNE algorithm is inefficient on existing CPU implementations.
Acc-t-SNE is up to 261x and 4x faster than scikit-learn and the state-of-the-art BH t-SNE implementation from daal4py.
arXiv Detail & Related papers (2022-12-22T06:38:40Z) - Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching.
Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z) - FINGER: Fast Inference for Graph-based Approximate Nearest Neighbor
Search [20.928821121591493]
We propose FINGER, a fast inference method to achieve efficient graph search.
FINGER approximates the distance function by estimating angles between neighboring residual vectors with low-rank bases and distribution matching.
Empirically, accelerating a popular graph-based method named HNSW by FINGER is shown to outperform existing graph-based methods by 20%-60% across different benchmark datasets.
arXiv Detail & Related papers (2022-06-22T22:30:46Z) - Scalable Graph Neural Networks via Bidirectional Propagation [89.70835710988395]
Graph Neural Networks (GNN) is an emerging field for learning on non-Euclidean data.
This paper presents GBP, a scalable GNN that utilizes a localized bidirectional propagation process from both the feature vectors and the training/testing nodes.
An empirical study demonstrates that GBP achieves state-of-the-art performance with significantly less training/testing time.
arXiv Detail & Related papers (2020-10-29T08:55:33Z) - Scaling Graph Neural Networks with Approximate PageRank [64.92311737049054]
We present the PPRGo model which utilizes an efficient approximation of information diffusion in GNNs.
In addition to being faster, PPRGo is inherently scalable, and can be trivially parallelized for large datasets like those found in industry settings.
We show that training PPRGo and predicting labels for all nodes in this graph takes under 2 minutes on a single machine, far outpacing other baselines on the same graph.
arXiv Detail & Related papers (2020-07-03T09:30:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.