Rate-optimal community detection near the KS threshold via node-robust algorithms
- URL: http://arxiv.org/abs/2511.16613v1
- Date: Thu, 20 Nov 2025 18:11:01 GMT
- Title: Rate-optimal community detection near the KS threshold via node-robust algorithms
- Authors: Jingqiu Ding, Yiding Hua, Kasper Lindberg, David Steurer, Aleksandr Storozhenko,
- Abstract summary: We study community detection in the emphsymmetric $k$stochastic block model, where $n$ nodes are evenly partitioned into $k$ with intra- and inter-cluster connections.<n>Our main result is a novel graph bisection algorithm via robust majority voting, which allows us to significantly improve the misclassification rate to $1/mathrmpoly(k) for initial estimation near the KS threshold.
- Score: 43.490963168751364
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study community detection in the \emph{symmetric $k$-stochastic block model}, where $n$ nodes are evenly partitioned into $k$ clusters with intra- and inter-cluster connection probabilities $p$ and $q$, respectively. Our main result is a polynomial-time algorithm that achieves the minimax-optimal misclassification rate \begin{equation*} \exp \Bigl(-\bigl(1 \pm o(1)\bigr) \tfrac{C}{k}\Bigr), \quad \text{where } C = (\sqrt{pn} - \sqrt{qn})^2, \end{equation*} whenever $C \ge K\,k^2\,\log k$ for some universal constant $K$, matching the Kesten--Stigum (KS) threshold up to a $\log k$ factor. Notably, this rate holds even when an adversary corrupts an $η\le \exp\bigl(- (1 \pm o(1)) \tfrac{C}{k}\bigr)$ fraction of the nodes. To the best of our knowledge, the minimax rate was previously only attainable either via computationally inefficient procedures [ZZ15] or via polynomial-time algorithms that require strictly stronger assumptions such as $C \ge K k^3$ [GMZZ17]. In the node-robust setting, the best known algorithm requires the substantially stronger condition $C \ge K k^{102}$ [LM22]. Our results close this gap by providing the first polynomial-time algorithm that achieves the minimax rate near the KS threshold in both settings. Our work has two key technical contributions: (1) we robustify majority voting via the Sum-of-Squares framework, (2) we develop a novel graph bisection algorithm via robust majority voting, which allows us to significantly improve the misclassification rate to $1/\mathrm{poly}(k)$ for the initial estimation near the KS threshold.
Related papers
- Incremental (k, z)-Clustering on Graphs [2.3322477552758234]
We develop a randomized incremental $(k, z)$-clustering algorithm that maintains with high probability a constant-factor approximation.<n>In the first stage, we maintain a constant-factor bicriteria approximate solution of size $tildeO(k)$ with a total update time of $m1+o(1)$ over all adversarial edge insertions.<n>In the second stage, we maintain a constant-factor approximate $(k,z)$-clustering solution on a dynamic weighted instance induced by the bicriteria approximate solution.
arXiv Detail & Related papers (2026-02-09T11:43:10Z) - Near-Exponential Savings for Mean Estimation with Active Learning [5.681847365688839]
We propose an active learning algorithm ("PartiBandits") to estimate $mathbbE[Y]$.<n>PartiBandits bridges the UCB and disagreement-based approaches to active learning.
arXiv Detail & Related papers (2025-11-07T21:48:55Z) - Near-Optimal Clustering in Mixture of Markov Chains [74.3828414695655]
We study the problem of clustering $T$ trajectories of length $H$, each generated by one of $K$ unknown ergodic Markov chains over a finite state space of size $S$.<n>We derive an instance-dependent, high-probability lower bound on the clustering error rate, governed by the weighted KL divergence between the transition kernels of the chains.<n>We then present a novel two-stage clustering algorithm.
arXiv Detail & Related papers (2025-06-02T05:10:40Z) - Improved Robust Estimation for Erdős-Rényi Graphs: The Sparse Regime and Optimal Breakdown Point [3.793609515750114]
We study the problem of robustly estimating the edge density of ErdHos-R'enyi random graphs $G(n, dcirc/n)$.<n>Our algorithm is based on the sum-of-squares hierarchy.
arXiv Detail & Related papers (2025-03-05T21:45:17Z) - Sign Operator for Coping with Heavy-Tailed Noise in Non-Convex Optimization: High Probability Bounds Under $(L_0, L_1)$-Smoothness [74.18546828528298]
We show that SignSGD with Majority Voting can robustly work on the whole range of complexity with $kappakappakappakappa-1right, kappakappakappa-1right, kappakappakappa-1right, kappakappakappa-1right, kappakappakappa-1right, kappakappakappa-1right, kappakappakappa-1right, kappa
arXiv Detail & Related papers (2025-02-11T19:54:11Z) - A Scalable Algorithm for Individually Fair K-means Clustering [77.93955971520549]
We present a scalable algorithm for the individually fair ($p$, $k$)-clustering problem introduced by Jung et al. and Mahabadi et al.
A clustering is then called individually fair if it has centers within distance $delta(x)$ of $x$ for each $xin P$.
We show empirically that not only is our algorithm much faster than prior work, but it also produces lower-cost solutions.
arXiv Detail & Related papers (2024-02-09T19:01:48Z) - Do you know what q-means? [42.96240569413475]
We present a classical $varepsilon$-$k$-means algorithm that performs an approximate version of one iteration of Lloyd's algorithm with time complexity.<n>We also propose an improved $q$-means quantum algorithm with time complexity.
arXiv Detail & Related papers (2023-08-18T17:52:12Z) - Private estimation algorithms for stochastic block models and mixture
models [63.07482515700984]
General tools for designing efficient private estimation algorithms.
First efficient $(epsilon, delta)$-differentially private algorithm for both weak recovery and exact recovery.
arXiv Detail & Related papers (2023-01-11T09:12:28Z) - Minimax Rates for Robust Community Detection [19.229475414802213]
We study the problem of community detection in the block model with adversarial node corruptions.
Our main result is an efficient algorithm that can tolerate an $epsilon$-fraction of corruptions and unbounded error $O(epsilon) + e-fracC2 (1 pm o(1))$ where $C = (sqrta - sqrtb)2$ is the signal-to-noise ratio.
We show that our algorithms are doubly-robust in the sense that they work in an even more
arXiv Detail & Related papers (2022-07-25T04:45:16Z) - Gaussian Process Bandit Optimization with Few Batches [49.896920704012395]
We introduce a batch algorithm inspired by finite-arm bandit algorithms.
We show that it achieves the cumulative regret upper bound $Oast(sqrtTgamma_T)$ using $O(loglog T)$ batches within time horizon $T$.
In addition, we propose a modified version of our algorithm, and characterize how the regret is impacted by the number of batches.
arXiv Detail & Related papers (2021-10-15T00:54:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.