On the Robustness of Spectral Algorithms for Semirandom Stochastic Block Models
- URL: http://arxiv.org/abs/2412.14315v1
- Date: Wed, 18 Dec 2024 20:35:02 GMT
- Title: On the Robustness of Spectral Algorithms for Semirandom Stochastic Block Models
- Authors: Aditya Bhaskara, Agastya Vibhuti Jha, Michael Kapralov, Naren Sarayu Manoj, Davide Mazzali, Weronika Wrzos-Kaminska,
- Abstract summary: We study the robustness of spectral algorithms against semirandom adversaries.
We identify classes of semirandom adversaries under which spectral bisection using the _unnormalized_ Laplacian is strongly consistent.
- Score: 11.137244714693779
- License:
- Abstract: In a graph bisection problem, we are given a graph $G$ with two equally-sized unlabeled communities, and the goal is to recover the vertices in these communities. A popular heuristic, known as spectral clustering, is to output an estimated community assignment based on the eigenvector corresponding to the second smallest eigenvalue of the Laplacian of $G$. Spectral algorithms can be shown to provably recover the cluster structure for graphs generated from certain probabilistic models, such as the Stochastic Block Model (SBM). However, spectral clustering is known to be non-robust to model mis-specification. Techniques based on semidefinite programming have been shown to be more robust, but they incur significant computational overheads. In this work, we study the robustness of spectral algorithms against semirandom adversaries. Informally, a semirandom adversary is allowed to ``helpfully'' change the specification of the model in a way that is consistent with the ground-truth solution. Our semirandom adversaries in particular are allowed to add edges inside clusters or increase the probability that an edge appears inside a cluster. Semirandom adversaries are a useful tool to determine the extent to which an algorithm has overfit to statistical assumptions on the input. On the positive side, we identify classes of semirandom adversaries under which spectral bisection using the _unnormalized_ Laplacian is strongly consistent, i.e., it exactly recovers the planted partitioning. On the negative side, we show that in these classes spectral bisection with the _normalized_ Laplacian outputs a partitioning that makes a classification mistake on a constant fraction of the vertices. Finally, we demonstrate numerical experiments that complement our theoretical findings.
Related papers
- Robustness for Spectral Clustering of General Graphs under Local Differential Privacy [2.4447259440166302]
We argue that social networks do not originate from the block model (SBM)
Our primary focus is the edge flipping method -- a common technique for protecting local differential privacy.
Although clustering outcomes have been stable for dense and well-clustered graphs produced from the SBM, we show that in general, spectral clustering may yield highly erratic results on certain dense and well-clustered graphs.
arXiv Detail & Related papers (2023-09-13T10:23:29Z) - Revisiting Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [69.15976031704687]
We propose IAC (Instance-Adaptive Clustering), the first algorithm whose performance matches the instance-specific lower bounds both in expectation and with high probability.
IAC maintains an overall computational complexity of $ mathcalO(n, textpolylog(n) $, making it scalable and practical for large-scale problems.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Perfect Spectral Clustering with Discrete Covariates [68.8204255655161]
We propose a spectral algorithm that achieves perfect clustering with high probability on a class of large, sparse networks.
Our method is the first to offer a guarantee of consistent latent structure recovery using spectral clustering.
arXiv Detail & Related papers (2022-05-17T01:41:06Z) - On consistency of constrained spectral clustering under
representation-aware stochastic block model [20.6072287343024]
We study constrained spectral clustering with the aim of finding balanced clusters in a given textitsimilarity graph $mathcalG$.
We develop unnormalized and normalized variants of spectral clustering in this setting.
These algorithms use $mathcalR$ to find clusters in $mathcalG$ that approximately satisfy the proposed constraint.
arXiv Detail & Related papers (2022-03-03T20:41:14Z) - Lattice-Based Methods Surpass Sum-of-Squares in Clustering [98.46302040220395]
Clustering is a fundamental primitive in unsupervised learning.
Recent work has established lower bounds against the class of low-degree methods.
We show that, perhaps surprisingly, this particular clustering model textitdoes not exhibit a statistical-to-computational gap.
arXiv Detail & Related papers (2021-12-07T18:50:17Z) - Correlation Clustering Reconstruction in Semi-Adversarial Models [70.11015369368272]
Correlation Clustering is an important clustering problem with many applications.
We study the reconstruction version of this problem in which one is seeking to reconstruct a latent clustering corrupted by random noise and adversarial modifications.
arXiv Detail & Related papers (2021-08-10T14:46:17Z) - Spectral clustering under degree heterogeneity: a case for the random
walk Laplacian [83.79286663107845]
This paper shows that graph spectral embedding using the random walk Laplacian produces vector representations which are completely corrected for node degree.
In the special case of a degree-corrected block model, the embedding concentrates about K distinct points, representing communities.
arXiv Detail & Related papers (2021-05-03T16:36:27Z) - Shaping Deep Feature Space towards Gaussian Mixture for Visual
Classification [74.48695037007306]
We propose a Gaussian mixture (GM) loss function for deep neural networks for visual classification.
With a classification margin and a likelihood regularization, the GM loss facilitates both high classification performance and accurate modeling of the feature distribution.
The proposed model can be implemented easily and efficiently without using extra trainable parameters.
arXiv Detail & Related papers (2020-11-18T03:32:27Z) - Average Sensitivity of Spectral Clustering [31.283432482502278]
We study the stability of spectral clustering against edge perturbations in the input graph.
Our results suggest that spectral clustering is stable against edge perturbations when there is a cluster structure in the input graph.
arXiv Detail & Related papers (2020-06-07T09:14:44Z) - Strong Consistency, Graph Laplacians, and the Stochastic Block Model [1.2891210250935143]
We study the performance of classical two-step spectral clustering via the graph Laplacian to learn the block model.
We prove that spectral clustering is able to achieve exact recovery of the planted community structure under conditions that match the information-theoretic limits.
arXiv Detail & Related papers (2020-04-21T07:16:46Z) - Randomized Spectral Clustering in Large-Scale Stochastic Block Models [13.366036495177923]
We study the spectral clustering using randomized sketching algorithms from a statistical perspective.
Under mild conditions, the randomized spectral clustering algorithms lead to the same theoretical bounds as those of the original spectral clustering algorithm.
A new R package called Rclust is developed and made available to the public.
arXiv Detail & Related papers (2020-01-20T04:15:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.