Related papers: Neighborhood Stability as a Measure of Nearest Neighbor Searchability

Neighborhood Stability as a Measure of Nearest Neighbor Searchability

URL: http://arxiv.org/abs/2602.16673v1
Date: Wed, 18 Feb 2026 18:09:47 GMT
Title: Neighborhood Stability as a Measure of Nearest Neighbor Searchability
Authors: Thomas Vecchiato, Sebastian Bruch,
Abstract summary: Clustering-based Approximate Nearest Neighbor Search (ANNS) organizes a set of points into partitions, and searches only a few of them to find the nearest neighbors of a query.<n>Despite its popularity, there are virtually no analytical tools to determine the suitability of clustering-based ANNS for a given dataset.<n>We present two measures for flat clusterings of high-dimensional points in Euclidean space.
Score: 8.035521056416242
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Clustering-based Approximate Nearest Neighbor Search (ANNS) organizes a set of points into partitions, and searches only a few of them to find the nearest neighbors of a query. Despite its popularity, there are virtually no analytical tools to determine the suitability of clustering-based ANNS for a given dataset -- what we call "searchability." To address that gap, we present two measures for flat clusterings of high-dimensional points in Euclidean space. First is Clustering-Neighborhood Stability Measure (clustering-NSM), an internal measure of clustering quality -- a function of a clustering of a dataset -- that we show to be predictive of ANNS accuracy. The second, Point-Neighborhood Stability Measure (point-NSM), is a measure of clusterability -- a function of the dataset itself -- that is predictive of clustering-NSM. The two together allow us to determine whether a dataset is searchable by clustering-based ANNS given only the data points. Importantly, both are functions of nearest neighbor relationships between points, not distances, making them applicable to various distance functions including inner product.

Related papers

Parameter-Free Clustering via Self-Supervised Consensus Maximization (Extended Version) [50.41628860536753]
We propose a novel and fully parameter-free clustering framework via Self-supervised Consensus Maximization, named SCMax.<n>Our framework performs hierarchical agglomerative clustering and cluster evaluation in a single, integrated process.
arXiv Detail & Related papers (2025-11-12T11:17:17Z)
K*-Means: A Parameter-free Clustering Algorithm [55.20132267309382]
k*-means is a novel clustering algorithm that eliminates the need to set k or any other parameters.<n>It uses the minimum description length principle to automatically determine the optimal number of clusters, k*, by splitting and merging clusters.<n>We prove that k*-means is guaranteed to converge and demonstrate experimentally that it significantly outperforms existing methods in scenarios where k is unknown.
arXiv Detail & Related papers (2025-05-17T08:41:07Z)
Adaptive and Robust DBSCAN with Multi-agent Reinforcement Learning [53.527506374566485]
We propose a novel Adaptive and Robust DBSCAN with Multi-agent Reinforcement Learning cluster framework, namely AR-DBSCAN.<n>We show that AR-DBSCAN not only improves clustering accuracy by up to 144.1% and 175.3% in the NMI and ARI metrics, respectively, but also is capable of robustly finding dominant parameters.
arXiv Detail & Related papers (2025-05-07T11:37:23Z)
Village-Net Clustering: A Rapid approach to Non-linear Unsupervised Clustering of High-Dimensional Data [0.0]
We develop an unsupervised clustering algorithm, we call "Village-Net"<n>The algorithm operates in two phases: first, utilizing K-Means clustering, it divides the dataset into distinct subsets we refer to as "villages"<n>We present extensive benchmarking on extant real-world datasets with known ground-truth labels to showcase its competitive performance.
arXiv Detail & Related papers (2025-01-16T06:56:43Z)
Learning Cluster Representatives for Approximate Nearest Neighbor Search [0.0]
This thesis provides a comprehensive explanation of clustering-based approximate nearest neighbor search.<n>It also introduces and delve into every aspect of our novel state-of-the-art method.<n>The development of this intuition and applying it to maximum inner product search has led us to demonstrate that learning cluster representatives using a simple linear function significantly boosts the accuracy of clustering-based approximate nearest neighbor search.
arXiv Detail & Related papers (2024-12-08T12:31:32Z)
Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks. We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z)
A Parametrizable Algorithm for Distributed Approximate Similarity Search with Arbitrary Distances [0.5030361857850012]
We propose PDASC (Parametrizable Distributed Approximate Similarity Search with Clustering), a novel Approximate Nearest Neighbour search algorithm.<n>We show that PDASC is suitable for operating in distributed data environments and for handling datasets defined in different topologies.
arXiv Detail & Related papers (2024-05-22T16:19:52Z)
DenMune: Density peak based clustering using mutual nearest neighbors [0.0]
Many clustering algorithms fail when clusters are of arbitrary shapes, of varying densities, or the data classes are unbalanced and close to each other. A novel clustering algorithm, DenMune is presented to meet this challenge. It is based on identifying dense regions using mutual nearest neighborhoods of size K, where K is the only parameter required from the user.
arXiv Detail & Related papers (2023-09-23T16:18:00Z)
Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation. Specifically, we construct distance matrix between data points by Butterworth filter. To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z)
Spatiotemporal k-means [39.98633724527769]
We propose a twotemporal clustering method called k-means (STk) that is able to analyze multi-scale clusters. We show how STkM can be extended to more complex machine learning tasks, particularly unsupervised region of interest detection and tracking in videos.
arXiv Detail & Related papers (2022-11-10T04:40:31Z)
Swarm Intelligence for Self-Organized Clustering [6.85316573653194]
A swarm system called Databionic swarm (DBS) is introduced which is able to adapt itself to structures of high-dimensional data. By exploiting the interrelations of swarm intelligence, self-organization and emergence, DBS serves as an alternative approach to the optimization of a global objective function in the task of clustering.
arXiv Detail & Related papers (2021-06-10T06:21:48Z)
Stable and consistent density-based clustering via multiparameter persistence [49.1574468325115]
We consider the degree-Rips construction from topological data analysis.<n>We analyze its stability to perturbations of the input data using the correspondence-interleaving distance.<n>We integrate these methods into a pipeline for density-based clustering, which we call Persistable.
arXiv Detail & Related papers (2020-05-18T19:45:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.