Related papers: Efficiently Constructing Sparse Navigable Graphs

Efficiently Constructing Sparse Navigable Graphs

URL: http://arxiv.org/abs/2507.13296v1
Date: Thu, 17 Jul 2025 17:04:18 GMT
Title: Efficiently Constructing Sparse Navigable Graphs
Authors: Alex Conway, Laxman Dhulipala, Martin Farach-Colton, Rob Johnson, Ben Landrum, Christopher Musco, Yarin Shechter, Torsten Suel, Richard Wen,
Abstract summary: We present an $tildeO(n2)$ time algorithm for constructing an $O(log n)$-approximate sparsest navigable graph under any distance function.<n>We also show that our techniques can beat cubic time for the closely related and practically important problems of constructing $alpha$-shortcut reachable and $tau$-monotonic graphs.
Score: 11.317292211864013
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Graph-based nearest neighbor search methods have seen a surge of popularity in recent years, offering state-of-the-art performance across a wide variety of applications. Central to these methods is the task of constructing a sparse navigable search graph for a given dataset endowed with a distance function. Unfortunately, doing so is computationally expensive, so heuristics are universally used in practice. In this work, we initiate the study of fast algorithms with provable guarantees for search graph construction. For a dataset with $n$ data points, the problem of constructing an optimally sparse navigable graph can be framed as $n$ separate but highly correlated minimum set cover instances. This yields a naive $O(n^3)$ time greedy algorithm that returns a navigable graph whose sparsity is at most $O(\log n)$ higher than optimal. We improve significantly on this baseline, taking advantage of correlation between the set cover instances to leverage techniques from streaming and sublinear-time set cover algorithms. Combined with problem-specific pre-processing techniques, we present an $\tilde{O}(n^2)$ time algorithm for constructing an $O(\log n)$-approximate sparsest navigable graph under any distance function. The runtime of our method is optimal up to logarithmic factors under the Strong Exponential Time Hypothesis via a reduction from Monochromatic Closest Pair. Moreover, we prove that, as with general set cover, obtaining better than an $O(\log n)$-approximation is NP-hard, despite the significant additional structure present in the navigable graph problem. Finally, we show that our techniques can also beat cubic time for the closely related and practically important problems of constructing $\alpha$-shortcut reachable and $\tau$-monotonic graphs, which are also used for nearest neighbor search. For such graphs, we obtain $\tilde{O}(n^{2.5})$ time or better algorithms.

Related papers

Incremental Approximate Single-Source Shortest Paths with Predictions [10.749658550545238]
We study the fundamental data structure problem of maintaining approximate shortest paths in incremental graphs.<n>We show these techniques immediately extend to the all-pairs shortest-path setting as well.<n>Our algorithms are consistent (performing nearly as fast as the offline algorithm) when predictions are nearly perfect.
arXiv Detail & Related papers (2025-02-12T05:06:23Z)
Navigable Graphs for High-Dimensional Nearest Neighbor Search: Constructions and Limits [24.592554830963966]
A graph is navigable if we can successfully move from any starting node to any target node.<n>The important question for applications is if sparser graphs can be constructed.<n>We give a simple and efficient way to construct a navigable graph with average degree $O(sqrtn log n )$ for any set of $n$ points, in any dimension, for any distance function.
arXiv Detail & Related papers (2024-05-29T01:07:26Z)
Efficiently Learning One-Hidden-Layer ReLU Networks via Schur Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss. Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z)
Fast Maximum $k$-Plex Algorithms Parameterized by Small Degeneracy Gaps [30.06993761032835]
The maximum $k$-plex problem is important but computationally challenging in applications such as graph mining and community detection. We present an exact algorithm parameterized by $g_k(G)$, which has the worst-case running time in the size of the input graph and exponential in $g_k(G)$. We further extend our discussion to an even smaller parameter $cg_k(G)$, the gap between the community-degeneracy bound and the size of the maximum $k$-plex.
arXiv Detail & Related papers (2023-06-23T01:28:24Z)
A Theoretical Analysis Of Nearest Neighbor Search On Approximate Near Neighbor Graph [51.880164098926166]
Graph-based algorithms have demonstrated state-of-the-art performance in the nearest neighbor search (NN-Search) problem. There exists a practice-to-theory gap in the graph-based NN-Search algorithms. We present theoretical guarantees of solving NN-Search via greedy search on ANN-Graph for low dimensional and dense vectors.
arXiv Detail & Related papers (2023-03-10T21:18:34Z)
On the Unlikelihood of D-Separation [69.62839677485087]
We provide analytic evidence that on large graphs, d-separation is a rare phenomenon, even when guaranteed to exist. For the PC Algorithm, while it is known that its worst-case guarantees fail on non-sparse graphs, we show that the same is true for the average case. For UniformSGS, while it is known that the running time is exponential for existing edges, we show that in the average case, that is the expected running time for most non-existing edges as well.
arXiv Detail & Related papers (2023-03-10T00:11:18Z)
Sublinear Time and Space Algorithms for Correlation Clustering via Sparse-Dense Decompositions [9.29659220237395]
We present a new approach for solving (minimum disagreement) correlation clustering. We obtain sublinear algorithms with highly efficient time and space complexity. The main ingredient of our approach is a novel connection to sparse-dense graph decompositions.
arXiv Detail & Related papers (2021-09-29T16:25:02Z)
Online Sub-Sampling for Reinforcement Learning with General Function Approximation [111.01990889581243]
In this paper, we establish an efficient online sub-sampling framework that measures the information gain of data points collected by an RL algorithm. For a value-based method with complexity-bounded function class, we show that the policy only needs to be updated for $proptooperatornamepolylog(K)$ times. In contrast to existing approaches that update the policy for at least $Omega(K)$ times, our approach drastically reduces the number of optimization calls in solving for a policy.
arXiv Detail & Related papers (2021-06-14T07:36:25Z)
Hierarchical Agglomerative Graph Clustering in Nearly-Linear Time [1.5644420658691407]
We study the widely used hierarchical agglomerative clustering (HAC) algorithm on edge-weighted graphs. We define an algorithmic framework for hierarchical agglomerative graph clustering. We show that our approach can speed up clustering of point datasets by a factor of 20.7--76.5x.
arXiv Detail & Related papers (2021-06-10T09:29:05Z)
Random Graph Matching with Improved Noise Robustness [2.294014185517203]
We propose a new algorithm for graph matching under probabilistic models. Our algorithm recovers the underlying matching with high probability when $alpha le 1 / (log log n)C$. This improves the condition $alpha le 1 / (log n)C$ achieved in previous work.
arXiv Detail & Related papers (2021-01-28T02:39:27Z)
List-Decodable Mean Estimation in Nearly-PCA Time [50.79691056481693]
We study the fundamental task of list-decodable mean estimation in high dimensions. Our algorithm runs in time $widetildeO(ndk)$ for all $k = O(sqrtd) cup Omega(d)$, where $n$ is the size of the dataset. A variant of our algorithm has runtime $widetildeO(ndk)$ for all $k$, at the expense of an $O(sqrtlog k)$ factor in the recovery guarantee
arXiv Detail & Related papers (2020-11-19T17:21:37Z)
Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model. We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z)
Near-Optimal Algorithms for Minimax Optimization [115.21519161773287]
The paper presents the first with $tildeO(sqrtkappa_mathbf xkappa_mathbf)$, matching the design on logarithmic factors. The paper also presents algorithms that match or outperform all existing methods in these settings in terms of complexity.
arXiv Detail & Related papers (2020-02-05T16:49:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.