Community Detection in the Hypergraph SBM: Exact Recovery Given the
Similarity Matrix
- URL: http://arxiv.org/abs/2208.12227v3
- Date: Sat, 14 Oct 2023 18:07:48 GMT
- Title: Community Detection in the Hypergraph SBM: Exact Recovery Given the
Similarity Matrix
- Authors: Julia Gaudio, Nirmit Joshi
- Abstract summary: We study the performance of algorithms which operate on the $similarity$matrix $W$, where $W_ij$ reports the number of hyperedges containing both $i$ and $j$.
We design a simple and highly efficient spectral algorithm with nearly linear runtime and show that it achieves the min-bisection threshold.
- Score: 1.74048653626208
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Community detection is a fundamental problem in network science. In this
paper, we consider community detection in hypergraphs drawn from the
$hypergraph$ $stochastic$ $block$ $model$ (HSBM), with a focus on exact
community recovery. We study the performance of polynomial-time algorithms
which operate on the $similarity$ $matrix$ $W$, where $W_{ij}$ reports the
number of hyperedges containing both $i$ and $j$. Under this information model,
while the precise information-theoretic limit is unknown, Kim, Bandeira, and
Goemans derived a sharp threshold up to which the natural min-bisection
estimator on $W$ succeeds. As min-bisection is NP-hard in the worst case, they
additionally proposed a semidefinite programming (SDP) relaxation and
conjectured that it achieves the same recovery threshold as the min-bisection
estimator.
In this paper, we confirm this conjecture. We also design a simple and highly
efficient spectral algorithm with nearly linear runtime and show that it
achieves the min-bisection threshold. Moreover, the spectral algorithm also
succeeds in denser regimes and is considerably more efficient than previous
approaches, establishing it as the method of choice. Our analysis of the
spectral algorithm crucially relies on strong $entrywise$ bounds on the
eigenvectors of $W$. Our bounds are inspired by the work of Abbe, Fan, Wang,
and Zhong, who developed entrywise bounds for eigenvectors of symmetric
matrices with independent entries. Despite the complex dependency structure in
similarity matrices, we prove similar entrywise guarantees.
Related papers
- Sum-of-squares lower bounds for Non-Gaussian Component Analysis [33.80749804695003]
Non-Gaussian Component Analysis (NGCA) is the statistical task of finding a non-Gaussian direction in a high-dimensional dataset.
Here we study the complexity of NGCA in the Sum-of-Squares framework.
arXiv Detail & Related papers (2024-10-28T18:19:13Z) - Exact Community Recovery (under Side Information): Optimality of Spectral Algorithms [1.4732811715354452]
We study the problem of exact community recovery in general, two-community block models.
We provide a unified analysis of the effect of side information on the information-theoretic limits of exact recovery.
arXiv Detail & Related papers (2024-06-18T21:48:59Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - Private estimation algorithms for stochastic block models and mixture
models [63.07482515700984]
General tools for designing efficient private estimation algorithms.
First efficient $(epsilon, delta)$-differentially private algorithm for both weak recovery and exact recovery.
arXiv Detail & Related papers (2023-01-11T09:12:28Z) - Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - Robust Sparse Mean Estimation via Sum of Squares [42.526664955704746]
We study the problem of high-dimensional sparse mean estimation in the presence of an $epsilon$-fraction of adversarial outliers.
Our algorithms follow the Sum-of-Squares based, to algorithms approach.
arXiv Detail & Related papers (2022-06-07T16:49:54Z) - Sparse random hypergraphs: Non-backtracking spectra and community
detection [10.503525445174464]
We prove that a spectral method based on the non-backtracking operator for hypergraphs works with high probability down to the generalized Kesten-Stigum detection threshold conjectured by Angelini et al.
This is the first provable and efficient algorithm that achieves the conjectured threshold for HSBMs with $r$ blocks generated according to a general symmetric probability tensor.
arXiv Detail & Related papers (2022-03-14T17:45:03Z) - Gaussian Process Bandit Optimization with Few Batches [49.896920704012395]
We introduce a batch algorithm inspired by finite-arm bandit algorithms.
We show that it achieves the cumulative regret upper bound $Oast(sqrtTgamma_T)$ using $O(loglog T)$ batches within time horizon $T$.
In addition, we propose a modified version of our algorithm, and characterize how the regret is impacted by the number of batches.
arXiv Detail & Related papers (2021-10-15T00:54:04Z) - Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean
Estimation [58.24280149662003]
We study the problem of list-decodable mean estimation, where an adversary can corrupt a majority of the dataset.
We develop new algorithms for list-decodable mean estimation, achieving nearly-optimal statistical guarantees.
arXiv Detail & Related papers (2021-06-16T03:34:14Z) - An Online Riemannian PCA for Stochastic Canonical Correlation Analysis [37.8212762083567]
We present an efficient algorithm (RSG+) for canonical correlation analysis (CCA) using a reparametrization of the projection matrices.
While the paper primarily focuses on the formulation and technical analysis of its properties, our experiments show that the empirical behavior on common datasets is quite promising.
arXiv Detail & Related papers (2021-06-08T23:38:29Z) - Linear-Sample Learning of Low-Rank Distributions [56.59844655107251]
We show that learning $ktimes k$, rank-$r$, matrices to normalized $L_1$ distance requires $Omega(frackrepsilon2)$ samples.
We propose an algorithm that uses $cal O(frackrepsilon2log2fracepsilon)$ samples, a number linear in the high dimension, and nearly linear in the matrices, typically low, rank proofs.
arXiv Detail & Related papers (2020-09-30T19:10:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.