Related papers: Refining a $k$-nearest neighbor graph for a computationally efficient spectral clustering

Refining a $k$-nearest neighbor graph for a computationally efficient spectral clustering

URL: http://arxiv.org/abs/2302.11296v1
Date: Wed, 22 Feb 2023 11:31:32 GMT
Title: Refining a $k$-nearest neighbor graph for a computationally efficient spectral clustering
Authors: Mashaan Alshammari, John Stavrakakis, Masahiro Takatsuka
Abstract summary: approximate spectral clustering (ASC) uses sampling or quantization to select data representatives. We propose a refined version of $k$-nearest neighbor graph, in which we keep data points and aggressively reduce number of edges. Compared to ASC methods, the proposed method delivered a consistent performance despite significant reduction of edges.
Score: 1.5469452301122175
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Spectral clustering became a popular choice for data clustering for its ability of uncovering clusters of different shapes. However, it is not always preferable over other clustering methods due to its computational demands. One of the effective ways to bypass these computational demands is to perform spectral clustering on a subset of points (data representatives) then generalize the clustering outcome, this is known as approximate spectral clustering (ASC). ASC uses sampling or quantization to select data representatives. This makes it vulnerable to 1) performance inconsistency (since these methods have a random step either in initialization or training), 2) local statistics loss (because the pairwise similarities are extracted from data representatives instead of data points). We proposed a refined version of $k$-nearest neighbor graph, in which we keep data points and aggressively reduce number of edges for computational efficiency. Local statistics were exploited to keep the edges that do not violate the intra-cluster distances and nullify all other edges in the $k$-nearest neighbor graph. We also introduced an optional step to automatically select the number of clusters $C$. The proposed method was tested on synthetic and real datasets. Compared to ASC methods, the proposed method delivered a consistent performance despite significant reduction of edges.

Related papers

MeanCut: A Greedy-Optimized Graph Clustering via Path-based Similarity and Degree Descent Criterion [0.6906005491572401]
spectral clustering is popular and attractive due to the remarkable performance, easy implementation, and strong adaptability. We propose MeanCut as the objective function and greedily optimize it in degree descending order for a nondestructive graph partition. The validity of our algorithm is demonstrated by testifying on real-world benchmarks and application of face recognition.
arXiv Detail & Related papers (2023-12-07T06:19:39Z)
Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering. In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework. In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z)
Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels. We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z)
Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation. Specifically, we construct distance matrix between data points by Butterworth filter. To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z)
A parameter-free graph reduction for spectral clustering and SpectralNet [1.5469452301122175]
We introduce a graph reduction method that does not require any parameters. First, distances from every point $p$ to its neighbors are filtered using an adaptive threshold to only keep neighbors with similar surrounding density. The edges that survive two steps form the constructed graph that was passed to spectral clustering SpectralNet.
arXiv Detail & Related papers (2023-02-25T21:19:30Z)
Approximate spectral clustering with eigenvector selection and self-tuned $k$ [1.827510863075184]
Recently emerged spectral clustering surpasses conventional clustering methods by detecting clusters of any shape without the convexity assumption. In practice, manual setting of $k$ could be subjective or time consuming. The proposed algorithm has two relevance metrics for estimating $k$ in two vital steps of ASC. The experimental setup demonstrates the efficiency of the proposed algorithm and its ability to compete with similar methods where $k$ was set manually.
arXiv Detail & Related papers (2023-02-22T11:32:24Z)
ck-means, a novel unsupervised learning method that combines fuzzy and crispy clustering methods to extract intersecting data [1.827510863075184]
This paper proposes a method to cluster data that share the same intersections between two features or more. The main idea of this novel method is to generate fuzzy clusters of data using a Fuzzy C-Means (FCM) algorithm. The algorithm is also able to find the optimal number of clusters for the FCM and the k-means algorithm, according to the consistency of the clusters given by the Silhouette Index (SI)
arXiv Detail & Related papers (2022-06-17T19:29:50Z)
Determinantal consensus clustering [77.34726150561087]
We propose the use of determinantal point processes or DPP for the random restart of clustering algorithms. DPPs favor diversity of the center points within subsets. We show through simulations that, contrary to DPP, this technique fails both to ensure diversity, and to obtain a good coverage of all data facets.
arXiv Detail & Related papers (2021-02-07T23:48:24Z)
Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed. We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
Average Sensitivity of Spectral Clustering [31.283432482502278]
We study the stability of spectral clustering against edge perturbations in the input graph. Our results suggest that spectral clustering is stable against edge perturbations when there is a cluster structure in the input graph.
arXiv Detail & Related papers (2020-06-07T09:14:44Z)
Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA. We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z)
Local Graph Clustering with Network Lasso [90.66817876491052]
We study the statistical and computational properties of a network Lasso method for local graph clustering. The clusters delivered by nLasso can be characterized elegantly via network flows between cluster boundary and seed nodes.
arXiv Detail & Related papers (2020-04-25T17:52:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.