Average Sensitivity of Spectral Clustering
- URL: http://arxiv.org/abs/2006.04094v1
- Date: Sun, 7 Jun 2020 09:14:44 GMT
- Title: Average Sensitivity of Spectral Clustering
- Authors: Pan Peng, Yuichi Yoshida
- Abstract summary: We study the stability of spectral clustering against edge perturbations in the input graph.
Our results suggest that spectral clustering is stable against edge perturbations when there is a cluster structure in the input graph.
- Score: 31.283432482502278
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spectral clustering is one of the most popular clustering methods for finding
clusters in a graph, which has found many applications in data mining. However,
the input graph in those applications may have many missing edges due to error
in measurement, withholding for a privacy reason, or arbitrariness in data
conversion. To make reliable and efficient decisions based on spectral
clustering, we assess the stability of spectral clustering against edge
perturbations in the input graph using the notion of average sensitivity, which
is the expected size of the symmetric difference of the output clusters before
and after we randomly remove edges.
We first prove that the average sensitivity of spectral clustering is
proportional to $\lambda_2/\lambda_3^2$, where $\lambda_i$ is the $i$-th
smallest eigenvalue of the (normalized) Laplacian. We also prove an analogous
bound for $k$-way spectral clustering, which partitions the graph into $k$
clusters. Then, we empirically confirm our theoretical bounds by conducting
experiments on synthetic and real networks. Our results suggest that spectral
clustering is stable against edge perturbations when there is a cluster
structure in the input graph.
Related papers
- On the Robustness of Spectral Algorithms for Semirandom Stochastic Block Models [11.137244714693779]
We study the robustness of spectral algorithms against semirandom adversaries.
We identify classes of semirandom adversaries under which spectral bisection using the _unnormalized_ Laplacian is strongly consistent.
arXiv Detail & Related papers (2024-12-18T20:35:02Z) - HeNCler: Node Clustering in Heterophilous Graphs through Learned Asymmetric Similarity [55.27586970082595]
HeNCler is a novel approach for Heterophilous Node Clustering.
We show that HeNCler significantly enhances performance in node clustering tasks within heterophilous graph contexts.
arXiv Detail & Related papers (2024-05-27T11:04:05Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - Spectral Clustering on Large Datasets: When Does it Work? Theory from
Continuous Clustering and Density Cheeger-Buser [0.17188280334580194]
In modern machine learning, many data sets are modeled as a large number of points drawn from a probability density function.
Our work suggests that Shi-Malik spectral clustering works well on data drawn from mixtures of Laplace distributions.
Our key tool is a new Cheeger-Buser inequality for all probability densities, including discontinuous ones.
arXiv Detail & Related papers (2023-05-11T03:08:49Z) - A parameter-free graph reduction for spectral clustering and SpectralNet [1.5469452301122175]
We introduce a graph reduction method that does not require any parameters.
First, distances from every point $p$ to its neighbors are filtered using an adaptive threshold to only keep neighbors with similar surrounding density.
The edges that survive two steps form the constructed graph that was passed to spectral clustering SpectralNet.
arXiv Detail & Related papers (2023-02-25T21:19:30Z) - Refining a $k$-nearest neighbor graph for a computationally efficient
spectral clustering [1.5469452301122175]
approximate spectral clustering (ASC) uses sampling or quantization to select data representatives.
We propose a refined version of $k$-nearest neighbor graph, in which we keep data points and aggressively reduce number of edges.
Compared to ASC methods, the proposed method delivered a consistent performance despite significant reduction of edges.
arXiv Detail & Related papers (2023-02-22T11:31:32Z) - Perfect Spectral Clustering with Discrete Covariates [68.8204255655161]
We propose a spectral algorithm that achieves perfect clustering with high probability on a class of large, sparse networks.
Our method is the first to offer a guarantee of consistent latent structure recovery using spectral clustering.
arXiv Detail & Related papers (2022-05-17T01:41:06Z) - Improving Spectral Clustering Using Spectrum-Preserving Node Reduction [1.52292571922932]
We use spectrum-preserving node reduction to accelerate eigen-decomposition and generate concise representations of data sets.
The experimental results show dramatically improved clustering performance when compared with state-of-the-art methods.
arXiv Detail & Related papers (2021-10-24T01:43:12Z) - Spectral clustering under degree heterogeneity: a case for the random
walk Laplacian [83.79286663107845]
This paper shows that graph spectral embedding using the random walk Laplacian produces vector representations which are completely corrected for node degree.
In the special case of a degree-corrected block model, the embedding concentrates about K distinct points, representing communities.
arXiv Detail & Related papers (2021-05-03T16:36:27Z) - Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA.
We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z) - Local Graph Clustering with Network Lasso [90.66817876491052]
We study the statistical and computational properties of a network Lasso method for local graph clustering.
The clusters delivered by nLasso can be characterized elegantly via network flows between cluster boundary and seed nodes.
arXiv Detail & Related papers (2020-04-25T17:52:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.