Related papers: Strong consistency and optimality of spectral clustering in symmetric binary non-uniform Hypergraph Stochastic Block Model

Strong consistency and optimality of spectral clustering in symmetric binary non-uniform Hypergraph Stochastic Block Model

URL: http://arxiv.org/abs/2306.06845v1
Date: Mon, 12 Jun 2023 03:38:25 GMT
Title: Strong consistency and optimality of spectral clustering in symmetric binary non-uniform Hypergraph Stochastic Block Model
Authors: Haixiao Wang
Abstract summary: We study the unsupervised classification problem in random hypergraphs under the non-uniform emphHypergraph Block Model (HSBM) We find that strong consistency is achievable by aggregating information from all uniform layers, even if it is impossible when each layer is considered alone.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Consider the unsupervised classification problem in random hypergraphs under the non-uniform \emph{Hypergraph Stochastic Block Model} (HSBM) with two equal-sized communities ($n/2$), where each edge appears independently with some probability depending only on the labels of its vertices. In this paper, an \emph{information-theoretical} threshold for strong consistency is established. Below the threshold, every algorithm would misclassify at least two vertices with high probability, and the expected \emph{mismatch ratio} of the eigenvector estimator is upper bounded by $n$ to the power of minus the threshold. On the other hand, when above the threshold, despite the information loss induced by tensor contraction, one-stage spectral algorithms assign every vertex correctly with high probability when only given the contracted adjacency matrix, even if \emph{semidefinite programming} (SDP) fails in some scenarios. Moreover, strong consistency is achievable by aggregating information from all uniform layers, even if it is impossible when each layer is considered alone. Our conclusions are supported by both theoretical analysis and numerical experiments.

Related papers

Near-Optimal Clustering in Mixture of Markov Chains [74.3828414695655]
We study the problem of clustering $T$ trajectories of length $H$, each generated by one of $K$ unknown ergodic Markov chains over a finite state space of size $S$.<n>We derive an instance-dependent, high-probability lower bound on the clustering error rate, governed by the weighted KL divergence between the transition kernels of the chains.<n>We then present a novel two-stage clustering algorithm.
arXiv Detail & Related papers (2025-06-02T05:10:40Z)
One-step Bipartite Graph Cut: A Normalized Formulation and Its Application to Scalable Subspace Clustering [56.81492360414741]
We show how to enforce a one-step normalized cut for bipartite graphs, especially with linear-time complexity. In this paper, we first characterize a novel one-step bipartite graph cut criterion with normalized constraints, and theoretically prove its equivalence to a trace problem. We extend this cut criterion to a scalable subspace clustering approach, where adaptive anchor learning, bipartite graph learning, and one-step normalized bipartite graph partitioning are simultaneously modeled.
arXiv Detail & Related papers (2023-05-12T11:27:20Z)
Optimal and exact recovery on general non-uniform Hypergraph Stochastic Block Model [0.0]
We consider the community detection problem in random hypergraphs under the non-uniform hypergraphinger block model (HSBM) We establish, for the first time in the literature, a sharp threshold for exact recovery under this non-uniform case, subject to minor constraints. We provide two efficient algorithms which successfully achieve exact recovery when above the threshold, and attain the lowest possible ratio when the exact recovery is impossible.
arXiv Detail & Related papers (2023-04-25T20:30:33Z)
Multilayer hypergraph clustering using the aggregate similarity matrix [0.7373617024876725]
We consider the community recovery problem on a multilayer variant of the hypergraph block model (HSBM) In this work, we investigate a semidefinite programming (SDP) approach and obtain information-theoretic conditions on the model parameters that guarantee exact recovery.
arXiv Detail & Related papers (2023-01-27T11:15:46Z)
Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization [116.89941263390769]
We consider the smooth convex-concave bilinearly-coupled saddle-point problem, $min_mathbfxmax_mathbfyF(mathbfx) + H(mathbfx,mathbfy)$, where one has access to first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$. We present a emphaccelerated gradient-extragradient (AG-EG) descent-ascent algorithm that combines extragrad
arXiv Detail & Related papers (2022-06-17T06:10:20Z)
Semi-Supervised Clustering of Sparse Graphs: Crossing the Information-Theoretic Threshold [3.6052935394000234]
Block model is a canonical random graph model for clustering and community detection on network-structured data. No estimator based on the network topology can perform substantially better than chance on sparse graphs if the model parameter is below a certain threshold. We prove that with an arbitrary fraction of the labels feasible throughout the parameter domain.
arXiv Detail & Related papers (2022-05-24T00:03:25Z)
Semi-Supervised Subspace Clustering via Tensor Low-Rank Representation [64.49871502193477]
We propose a novel semi-supervised subspace clustering method, which is able to simultaneously augment the initial supervisory information and construct a discriminative affinity matrix. Comprehensive experimental results on six commonly-used benchmark datasets demonstrate the superiority of our method over state-of-the-art methods.
arXiv Detail & Related papers (2022-05-21T01:47:17Z)
Sparse random hypergraphs: Non-backtracking spectra and community detection [10.503525445174464]
We prove that a spectral method based on the non-backtracking operator for hypergraphs works with high probability down to the generalized Kesten-Stigum detection threshold conjectured by Angelini et al. This is the first provable and efficient algorithm that achieves the conjectured threshold for HSBMs with $r$ blocks generated according to a general symmetric probability tensor.
arXiv Detail & Related papers (2022-03-14T17:45:03Z)
Geometric Graph Representation Learning via Maximizing Rate Reduction [73.6044873825311]
Learning node representations benefits various downstream tasks in graph analysis such as community detection and node classification. We propose Geometric Graph Representation Learning (G2R) to learn node representations in an unsupervised manner. G2R maps nodes in distinct groups into different subspaces, while each subspace is compact and different subspaces are dispersed.
arXiv Detail & Related papers (2022-02-13T07:46:24Z)
Partial recovery and weak consistency in the non-uniform hypergraph Stochastic Block Model [6.681901523019242]
We consider the community detection problem in random hypergraphs under the nonuniform hypergraph block model (HSBM) We provide a spectral algorithm that outputs a partition with at least a $gamma$ fraction of the vertices classified correctly, where $gammain depends on the signal-to-noise ratio (SNR) of the model. The theoretical analysis of our algorithm relies on the concentration and regularization of the adjacency matrix for non-uniform random hypergraphs, which can be of independent interest.
arXiv Detail & Related papers (2021-12-22T05:38:33Z)
Lattice-Based Methods Surpass Sum-of-Squares in Clustering [98.46302040220395]
Clustering is a fundamental primitive in unsupervised learning. Recent work has established lower bounds against the class of low-degree methods. We show that, perhaps surprisingly, this particular clustering model textitdoes not exhibit a statistical-to-computational gap.
arXiv Detail & Related papers (2021-12-07T18:50:17Z)
High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability. Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level. We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z)
Exact Recovery in the General Hypergraph Stochastic Block Model [92.28929858529679]
This paper investigates fundamental limits of exact recovery in the general d-uniform hypergraph block model (d-HSBM) We show that there exists a sharp threshold such that exact recovery is achievable above the threshold and impossible below it.
arXiv Detail & Related papers (2021-05-11T03:39:08Z)
Spectral clustering under degree heterogeneity: a case for the random walk Laplacian [83.79286663107845]
This paper shows that graph spectral embedding using the random walk Laplacian produces vector representations which are completely corrected for node degree. In the special case of a degree-corrected block model, the embedding concentrates about K distinct points, representing communities.
arXiv Detail & Related papers (2021-05-03T16:36:27Z)
Agnostic Learning of Halfspaces with Gradient Descent via Soft Margins [92.7662890047311]
We show that gradient descent finds halfspaces with classification error $tilde O(mathsfOPT1/2) + varepsilon$ in $mathrmpoly(d,1/varepsilon)$ time and sample complexity.
arXiv Detail & Related papers (2020-10-01T16:48:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.