Graph-based Nearest Neighbors with Dynamic Updates via Random Walks
- URL: http://arxiv.org/abs/2512.18060v1
- Date: Fri, 19 Dec 2025 21:00:07 GMT
- Title: Graph-based Nearest Neighbors with Dynamic Updates via Random Walks
- Authors: Nina Mishra, Yonatan Naamad, Tal Wagner, Lichen Zhang,
- Abstract summary: We propose a new theoretical framework for graph-based Approximate nearest neighbor search (ANN) based on random walks.<n>We then utilize this framework to analyze a randomized deletion approach that preserves hitting time statistics compared to the graph before deleting the point.<n>We show that it provides better tradeoff between query latency, recall, deletion time, and memory usage through an extensive collection of experiments.
- Score: 14.905912653448686
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Approximate nearest neighbor search (ANN) is a common way to retrieve relevant search results, especially now in the context of large language models and retrieval augmented generation. One of the most widely used algorithms for ANN is based on constructing a multi-layer graph over the dataset, called the Hierarchical Navigable Small World (HNSW). While this algorithm supports insertion of new data, it does not support deletion of existing data. Moreover, deletion algorithms described by prior work come at the cost of increased query latency, decreased recall, or prolonged deletion time. In this paper, we propose a new theoretical framework for graph-based ANN based on random walks. We then utilize this framework to analyze a randomized deletion approach that preserves hitting time statistics compared to the graph before deleting the point. We then turn this theoretical framework into a deterministic deletion algorithm, and show that it provides better tradeoff between query latency, recall, deletion time, and memory usage through an extensive collection of experiments.
Related papers
- In-Place Updates of a Graph Index for Streaming Approximate Nearest Neighbor Search [12.092920351505036]
IP-DiskANN is first algorithm to avoid batch consolidation by efficiently processing each insertion and deletion in-place.<n>It has stable recall over various lengthy update patterns in both high-recall and low-recall regimes.
arXiv Detail & Related papers (2025-02-19T15:41:08Z) - LoRANN: Low-Rank Matrix Factorization for Approximate Nearest Neighbor Search [4.194768796374315]
We propose a new supervised score computation method based on the observation that inner product approximation is a multi-output regression problem.
Our experiments show that the proposed reduced-rank regression (RRR) method is superior to PQ in both query latency and memory usage.
We also introduce LoRANN, a clustering-based ANN library that leverages the proposed score computation method.
arXiv Detail & Related papers (2024-10-24T17:13:39Z) - Optimizing Tensor Computation Graphs with Equality Saturation and Monte Carlo Tree Search [0.0]
We present a tensor graph rewriting approach that uses Monte Carlo tree search to build superior representation.
Our approach improves the inference speedup of neural networks by up to 11% compared to existing methods.
arXiv Detail & Related papers (2024-10-07T22:22:02Z) - A Mirror Descent-Based Algorithm for Corruption-Tolerant Distributed Gradient Descent [57.64826450787237]
We show how to analyze the behavior of distributed gradient descent algorithms in the presence of adversarial corruptions.<n>We show how to use ideas from (lazy) mirror descent to design a corruption-tolerant distributed optimization algorithm.<n> Experiments based on linear regression, support vector classification, and softmax classification on the MNIST dataset corroborate our theoretical findings.
arXiv Detail & Related papers (2024-07-19T08:29:12Z) - SOAR: Improved Indexing for Approximate Nearest Neighbor Search [30.752720306189342]
Spilling with Orthogonality-Amplified Residuals (SOAR) is a novel data indexing technique for approximate nearest neighbor (ANN) search.
arXiv Detail & Related papers (2024-03-31T19:09:09Z) - Temporal Aggregation and Propagation Graph Neural Networks for Dynamic
Representation [67.26422477327179]
Temporal graphs exhibit dynamic interactions between nodes over continuous time.
We propose a novel method of temporal graph convolution with the whole neighborhood.
Our proposed TAP-GNN outperforms existing temporal graph methods by a large margin in terms of both predictive performance and online inference latency.
arXiv Detail & Related papers (2023-04-15T08:17:18Z) - A Theoretical Analysis Of Nearest Neighbor Search On Approximate Near
Neighbor Graph [51.880164098926166]
Graph-based algorithms have demonstrated state-of-the-art performance in the nearest neighbor search (NN-Search) problem.
There exists a practice-to-theory gap in the graph-based NN-Search algorithms.
We present theoretical guarantees of solving NN-Search via greedy search on ANN-Graph for low dimensional and dense vectors.
arXiv Detail & Related papers (2023-03-10T21:18:34Z) - Decision Diagram-Based Branch-and-Bound with Caching for Dominance and
Suboptimality Detection [9.175779296469194]
This paper presents new ingredients to speed up the search by exploiting the structure of dynamic programming models.
The key idea is to prevent the repeated expansion of nodes corresponding to the same dynamic programming states by querying expansion thresholds cached throughout the search.
Experiments show that the pruning brought by this caching mechanism allows significantly reducing the number of nodes expanded by the algorithm.
arXiv Detail & Related papers (2022-11-22T10:18:33Z) - Reinforcement Learning Based Query Vertex Ordering Model for Subgraph
Matching [58.39970828272366]
Subgraph matching algorithms enumerate all is embeddings of a query graph in a data graph G.
matching order plays a critical role in time efficiency of these backtracking based subgraph matching algorithms.
In this paper, for the first time we apply the Reinforcement Learning (RL) and Graph Neural Networks (GNNs) techniques to generate the high-quality matching order for subgraph matching algorithms.
arXiv Detail & Related papers (2022-01-25T00:10:03Z) - SreaMRAK a Streaming Multi-Resolution Adaptive Kernel Algorithm [60.61943386819384]
Existing implementations of KRR require that all the data is stored in the main memory.
We propose StreaMRAK - a streaming version of KRR.
We present a showcase study on two synthetic problems and the prediction of the trajectory of a double pendulum.
arXiv Detail & Related papers (2021-08-23T21:03:09Z) - Learning to Accelerate Heuristic Searching for Large-Scale Maximum
Weighted b-Matching Problems in Online Advertising [51.97494906131859]
Bipartite b-matching is fundamental in algorithm design, and has been widely applied into economic markets, labor markets, etc.
Existing exact and approximate algorithms usually fail in such settings due to either requiring intolerable running time or too much computation resource.
We propose textttNeuSearcher which leverages the knowledge learned from previously instances to solve new problem instances.
arXiv Detail & Related papers (2020-05-09T02:48:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.