A Note on Graph-Based Nearest Neighbor Search
- URL: http://arxiv.org/abs/2012.11083v1
- Date: Mon, 21 Dec 2020 02:18:05 GMT
- Title: A Note on Graph-Based Nearest Neighbor Search
- Authors: Hongya Wang, Zhizheng Wang, Wei Wang, Yingyuan Xiao, Zeng Zhao,
Kaixiang Yang
- Abstract summary: We show that high clustering coefficient makes most of the k nearest neighbors of q sit in a maximum strongly connected component ( SCC) in the graph.
We prove that the commonly used graph-based search algorithm is guaranteed to traverse the maximum SCC once visiting any point in it.
- Score: 4.38837720322254
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Nearest neighbor search has found numerous applications in machine learning,
data mining and massive data processing systems. The past few years have
witnessed the popularity of the graph-based nearest neighbor search paradigm
because of its superiority over the space-partitioning algorithms. While a lot
of empirical studies demonstrate the efficiency of graph-based algorithms, not
much attention has been paid to a more fundamental question: why graph-based
algorithms work so well in practice? And which data property affects the
efficiency and how? In this paper, we try to answer these questions. Our
insight is that "the probability that the neighbors of a point o tends to be
neighbors in the KNN graph" is a crucial data property for query efficiency.
For a given dataset, such a property can be qualitatively measured by
clustering coefficient of the KNN graph. To show how clustering coefficient
affects the performance, we identify that, instead of the global connectivity,
the local connectivity around some given query q has more direct impact on
recall. Specifically, we observed that high clustering coefficient makes most
of the k nearest neighbors of q sit in a maximum strongly connected component
(SCC) in the graph. From the algorithmic point of view, we show that the search
procedure is actually composed of two phases - the one outside the maximum SCC
and the other one in it, which is different from the widely accepted single or
multiple paths search models. We proved that the commonly used graph-based
search algorithm is guaranteed to traverse the maximum SCC once visiting any
point in it. Our analysis reveals that high clustering coefficient leads to
large size of the maximum SCC, and thus provides good answer quality with the
help of the two-phase search procedure. Extensive empirical results over a
comprehensive collection of datasets validate our findings.
Related papers
- PECANN: Parallel Efficient Clustering with Graph-Based Approximate
Nearest Neighbor Search [8.15681999722805]
This paper studies density-based clustering of point sets.
It unifies the different variants of density peaks clustering into a single framework, PECANN.
We implement five clustering algorithms with PECANN and evaluate them on synthetic and real-world datasets with up to 1.28 million points and up to 1024 dimensions on a 30-core machine with two-way hyper-threading.
arXiv Detail & Related papers (2023-12-06T22:43:50Z) - ParlayANN: Scalable and Deterministic Parallel Graph-Based Approximate
Nearest Neighbor Search Algorithms [5.478671305092084]
We introduce ParlayANN, a library of deterministic and parallel graph-based approximate nearest neighbor search algorithms.
We develop novel parallel implementations for four state-of-the-art graph-based ANNS algorithms that scale to billion-scale datasets.
arXiv Detail & Related papers (2023-05-07T19:28:23Z) - A Theoretical Analysis Of Nearest Neighbor Search On Approximate Near
Neighbor Graph [51.880164098926166]
Graph-based algorithms have demonstrated state-of-the-art performance in the nearest neighbor search (NN-Search) problem.
There exists a practice-to-theory gap in the graph-based NN-Search algorithms.
We present theoretical guarantees of solving NN-Search via greedy search on ANN-Graph for low dimensional and dense vectors.
arXiv Detail & Related papers (2023-03-10T21:18:34Z) - Efficient Graph Field Integrators Meet Point Clouds [59.27295475120132]
We present two new classes of algorithms for efficient field integration on graphs encoding point clouds.
The first class, SeparatorFactorization(SF), leverages the bounded genus of point cloud mesh graphs, while the second class, RFDiffusion(RFD), uses popular epsilon-nearest-neighbor graph representations for point clouds.
arXiv Detail & Related papers (2023-02-02T08:33:36Z) - Scalable and Effective Conductance-based Graph Clustering [9.938406925123722]
We develop a graph clustering framework textitPCon.
We show that many existing solutions can be reduced to our framework.
Based on our framework, we propose two novel algorithms textitPCon_core and emphPCon_de with linear time and space complexity.
arXiv Detail & Related papers (2022-11-22T12:45:27Z) - FINGER: Fast Inference for Graph-based Approximate Nearest Neighbor
Search [20.928821121591493]
We propose FINGER, a fast inference method to achieve efficient graph search.
FINGER approximates the distance function by estimating angles between neighboring residual vectors with low-rank bases and distribution matching.
Empirically, accelerating a popular graph-based method named HNSW by FINGER is shown to outperform existing graph-based methods by 20%-60% across different benchmark datasets.
arXiv Detail & Related papers (2022-06-22T22:30:46Z) - Effective and Efficient Graph Learning for Multi-view Clustering [173.8313827799077]
We propose an effective and efficient graph learning model for multi-view clustering.
Our method exploits the view-similar between graphs of different views by the minimization of tensor Schatten p-norm.
Our proposed algorithm is time-economical and obtains the stable results and scales well with the data size.
arXiv Detail & Related papers (2021-08-15T13:14:28Z) - Spatial-Spectral Clustering with Anchor Graph for Hyperspectral Image [88.60285937702304]
This paper proposes a novel unsupervised approach called spatial-spectral clustering with anchor graph (SSCAG) for HSI data clustering.
The proposed SSCAG is competitive against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-04-24T08:09:27Z) - Exact and Approximate Hierarchical Clustering Using A* [51.187990314731344]
We introduce a new approach based on A* search for clustering.
We overcome the prohibitively large search space by combining A* with a novel emphtrellis data structure.
We empirically demonstrate that our method achieves substantially higher quality results than baselines for a particle physics use case and other clustering benchmarks.
arXiv Detail & Related papers (2021-04-14T18:15:27Z) - Mining Large Quasi-cliques with Quality Guarantees from Vertex
Neighborhoods [44.007522460918565]
Mining dense subgraphs is an important primitive across a spectrum of graph-mining tasks.
We show that mining cliques and quasi-cliques of non-trivial sizes from real-world graphs is often not a difficult problem.
arXiv Detail & Related papers (2020-08-18T15:50:25Z) - Learning to Cluster Faces via Confidence and Connectivity Estimation [136.5291151775236]
We propose a fully learnable clustering framework without requiring a large number of overlapped subgraphs.
Our method significantly improves clustering accuracy and thus performance of the recognition models trained on top, yet it is an order of magnitude more efficient than existing supervised methods.
arXiv Detail & Related papers (2020-04-01T13:39:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.