Community Detection in the Hypergraph SBM: Exact Recovery Given the
Similarity Matrix
- URL: http://arxiv.org/abs/2208.12227v3
- Date: Sat, 14 Oct 2023 18:07:48 GMT
- Title: Community Detection in the Hypergraph SBM: Exact Recovery Given the
Similarity Matrix
- Authors: Julia Gaudio, Nirmit Joshi
- Abstract summary: We study the performance of algorithms which operate on the $similarity$matrix $W$, where $W_ij$ reports the number of hyperedges containing both $i$ and $j$.
We design a simple and highly efficient spectral algorithm with nearly linear runtime and show that it achieves the min-bisection threshold.
- Score: 1.74048653626208
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Community detection is a fundamental problem in network science. In this
paper, we consider community detection in hypergraphs drawn from the
$hypergraph$ $stochastic$ $block$ $model$ (HSBM), with a focus on exact
community recovery. We study the performance of polynomial-time algorithms
which operate on the $similarity$ $matrix$ $W$, where $W_{ij}$ reports the
number of hyperedges containing both $i$ and $j$. Under this information model,
while the precise information-theoretic limit is unknown, Kim, Bandeira, and
Goemans derived a sharp threshold up to which the natural min-bisection
estimator on $W$ succeeds. As min-bisection is NP-hard in the worst case, they
additionally proposed a semidefinite programming (SDP) relaxation and
conjectured that it achieves the same recovery threshold as the min-bisection
estimator.
In this paper, we confirm this conjecture. We also design a simple and highly
efficient spectral algorithm with nearly linear runtime and show that it
achieves the min-bisection threshold. Moreover, the spectral algorithm also
succeeds in denser regimes and is considerably more efficient than previous
approaches, establishing it as the method of choice. Our analysis of the
spectral algorithm crucially relies on strong $entrywise$ bounds on the
eigenvectors of $W$. Our bounds are inspired by the work of Abbe, Fan, Wang,
and Zhong, who developed entrywise bounds for eigenvectors of symmetric
matrices with independent entries. Despite the complex dependency structure in
similarity matrices, we prove similar entrywise guarantees.
Related papers
- Near-Optimal Online Learning for Multi-Agent Submodular Coordination: Tight Approximation and Communication Efficiency [52.60557300927007]
We present a $textbfMA-OSMA$ algorithm to transfer the discrete submodular problem into a continuous optimization.
We also introduce a projection-free $textbfMA-OSEA$ algorithm, which effectively utilizes the KL divergence by mixing a uniform distribution.
Our algorithms significantly improve the $(frac11+c)$-approximation provided by the state-of-the-art OSG algorithm.
arXiv Detail & Related papers (2025-02-07T15:57:56Z) - Entangled Mean Estimation in High-Dimensions [36.97113089188035]
We study the task of high-dimensional entangled mean estimation in the subset-of-signals model.
We show that the optimal error (up to polylogarithmic factors) is $f(alpha,N) + sqrtD/(alpha N)$, where the term $f(alpha,N)$ is the error of the one-dimensional problem and the second term is the sub-Gaussian error rate.
arXiv Detail & Related papers (2025-01-09T18:31:35Z) - SoS Certificates for Sparse Singular Values and Their Applications: Robust Statistics, Subspace Distortion, and More [37.208622097149714]
We give a new family of upper-time algorithms which can certify bounds on the maximum of $|M u|$.
Our certification algorithm makes essential use of the Sum-of-Squares hierarchy.
arXiv Detail & Related papers (2024-12-30T18:59:46Z) - Exact Community Recovery (under Side Information): Optimality of Spectral Algorithms [1.4732811715354452]
We study the problem of exact community recovery in general, two-community block models.
We provide a unified analysis of the effect of side information on the information-theoretic limits of exact recovery.
arXiv Detail & Related papers (2024-06-18T21:48:59Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - Private estimation algorithms for stochastic block models and mixture
models [63.07482515700984]
General tools for designing efficient private estimation algorithms.
First efficient $(epsilon, delta)$-differentially private algorithm for both weak recovery and exact recovery.
arXiv Detail & Related papers (2023-01-11T09:12:28Z) - Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - Robust Sparse Mean Estimation via Sum of Squares [42.526664955704746]
We study the problem of high-dimensional sparse mean estimation in the presence of an $epsilon$-fraction of adversarial outliers.
Our algorithms follow the Sum-of-Squares based, to algorithms approach.
arXiv Detail & Related papers (2022-06-07T16:49:54Z) - Sparse random hypergraphs: Non-backtracking spectra and community
detection [10.503525445174464]
We prove that a spectral method based on the non-backtracking operator for hypergraphs works with high probability down to the generalized Kesten-Stigum detection threshold conjectured by Angelini et al.
This is the first provable and efficient algorithm that achieves the conjectured threshold for HSBMs with $r$ blocks generated according to a general symmetric probability tensor.
arXiv Detail & Related papers (2022-03-14T17:45:03Z) - Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean
Estimation [58.24280149662003]
We study the problem of list-decodable mean estimation, where an adversary can corrupt a majority of the dataset.
We develop new algorithms for list-decodable mean estimation, achieving nearly-optimal statistical guarantees.
arXiv Detail & Related papers (2021-06-16T03:34:14Z) - Linear-Sample Learning of Low-Rank Distributions [56.59844655107251]
We show that learning $ktimes k$, rank-$r$, matrices to normalized $L_1$ distance requires $Omega(frackrepsilon2)$ samples.
We propose an algorithm that uses $cal O(frackrepsilon2log2fracepsilon)$ samples, a number linear in the high dimension, and nearly linear in the matrices, typically low, rank proofs.
arXiv Detail & Related papers (2020-09-30T19:10:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.