Average Sensitivity of Spectral Clustering
- URL: http://arxiv.org/abs/2006.04094v1
- Date: Sun, 7 Jun 2020 09:14:44 GMT
- Title: Average Sensitivity of Spectral Clustering
- Authors: Pan Peng, Yuichi Yoshida
- Abstract summary: We study the stability of spectral clustering against edge perturbations in the input graph.
Our results suggest that spectral clustering is stable against edge perturbations when there is a cluster structure in the input graph.
- Score: 31.283432482502278
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spectral clustering is one of the most popular clustering methods for finding
clusters in a graph, which has found many applications in data mining. However,
the input graph in those applications may have many missing edges due to error
in measurement, withholding for a privacy reason, or arbitrariness in data
conversion. To make reliable and efficient decisions based on spectral
clustering, we assess the stability of spectral clustering against edge
perturbations in the input graph using the notion of average sensitivity, which
is the expected size of the symmetric difference of the output clusters before
and after we randomly remove edges.
We first prove that the average sensitivity of spectral clustering is
proportional to $\lambda_2/\lambda_3^2$, where $\lambda_i$ is the $i$-th
smallest eigenvalue of the (normalized) Laplacian. We also prove an analogous
bound for $k$-way spectral clustering, which partitions the graph into $k$
clusters. Then, we empirically confirm our theoretical bounds by conducting
experiments on synthetic and real networks. Our results suggest that spectral
clustering is stable against edge perturbations when there is a cluster
structure in the input graph.
Related papers
- HeNCler: Node Clustering in Heterophilous Graphs through Learned Asymmetric Similarity [55.27586970082595]
HeNCler is a novel approach for Heterophilous Node Clustering.
We show that HeNCler significantly enhances performance in node clustering tasks within heterophilous graph contexts.
arXiv Detail & Related papers (2024-05-27T11:04:05Z) - MeanCut: A Greedy-Optimized Graph Clustering via Path-based Similarity
and Degree Descent Criterion [0.6906005491572401]
spectral clustering is popular and attractive due to the remarkable performance, easy implementation, and strong adaptability.
We propose MeanCut as the objective function and greedily optimize it in degree descending order for a nondestructive graph partition.
The validity of our algorithm is demonstrated by testifying on real-world benchmarks and application of face recognition.
arXiv Detail & Related papers (2023-12-07T06:19:39Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - Spectral Clustering on Large Datasets: When Does it Work? Theory from
Continuous Clustering and Density Cheeger-Buser [0.17188280334580194]
In modern machine learning, many data sets are modeled as a large number of points drawn from a probability density function.
Our work suggests that Shi-Malik spectral clustering works well on data drawn from mixtures of Laplace distributions.
Our key tool is a new Cheeger-Buser inequality for all probability densities, including discontinuous ones.
arXiv Detail & Related papers (2023-05-11T03:08:49Z) - A parameter-free graph reduction for spectral clustering and SpectralNet [1.5469452301122175]
We introduce a graph reduction method that does not require any parameters.
First, distances from every point $p$ to its neighbors are filtered using an adaptive threshold to only keep neighbors with similar surrounding density.
The edges that survive two steps form the constructed graph that was passed to spectral clustering SpectralNet.
arXiv Detail & Related papers (2023-02-25T21:19:30Z) - Refining a $k$-nearest neighbor graph for a computationally efficient
spectral clustering [1.5469452301122175]
approximate spectral clustering (ASC) uses sampling or quantization to select data representatives.
We propose a refined version of $k$-nearest neighbor graph, in which we keep data points and aggressively reduce number of edges.
Compared to ASC methods, the proposed method delivered a consistent performance despite significant reduction of edges.
arXiv Detail & Related papers (2023-02-22T11:31:32Z) - Perfect Spectral Clustering with Discrete Covariates [68.8204255655161]
We propose a spectral algorithm that achieves perfect clustering with high probability on a class of large, sparse networks.
Our method is the first to offer a guarantee of consistent latent structure recovery using spectral clustering.
arXiv Detail & Related papers (2022-05-17T01:41:06Z) - Improving Spectral Clustering Using Spectrum-Preserving Node Reduction [1.52292571922932]
We use spectrum-preserving node reduction to accelerate eigen-decomposition and generate concise representations of data sets.
The experimental results show dramatically improved clustering performance when compared with state-of-the-art methods.
arXiv Detail & Related papers (2021-10-24T01:43:12Z) - Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA.
We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z) - Local Graph Clustering with Network Lasso [90.66817876491052]
We study the statistical and computational properties of a network Lasso method for local graph clustering.
The clusters delivered by nLasso can be characterized elegantly via network flows between cluster boundary and seed nodes.
arXiv Detail & Related papers (2020-04-25T17:52:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.