Perfect Clustering for Sparse Directed Stochastic Block Models
- URL: http://arxiv.org/abs/2601.16427v1
- Date: Fri, 23 Jan 2026 03:53:20 GMT
- Title: Perfect Clustering for Sparse Directed Stochastic Block Models
- Authors: Behzad Aalipur, Yichen Qin,
- Abstract summary: We propose a non-spectral, two-stage procedure for community detection in sparse directed SBMs.<n>The method first estimates the directed probability matrix using a neighborhood-smoothing scheme tailored to the asymmetric setting.<n>We show that the proposed procedure performs reliably in regimes where directed spectral and score-based methods deteriorate.
- Score: 1.3464152928754485
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Exact recovery in stochastic block models (SBMs) is well understood in undirected settings, but remains considerably less developed for directed and sparse networks, particularly when the number of communities diverges. Spectral methods for directed SBMs often lack stability in asymmetric, low-degree regimes, and existing non-spectral approaches focus primarily on undirected or dense settings. We propose a fully non-spectral, two-stage procedure for community detection in sparse directed SBMs with potentially growing numbers of communities. The method first estimates the directed probability matrix using a neighborhood-smoothing scheme tailored to the asymmetric setting, and then applies $K$-means clustering to the estimated rows, thereby avoiding the limitations of eigen- or singular value decompositions in sparse, asymmetric networks. Our main theoretical contribution is a uniform row-wise concentration bound for the smoothed estimator, obtained through new arguments that control asymmetric neighborhoods and separate in- and out-degree effects. These results imply the exact recovery of all community labels with probability tending to one, under mild sparsity and separation conditions that allow both $γ_n \to 0$ and $K_n \to \infty$. Simulation studies, including highly directed, sparse, and non-symmetric block structures, demonstrate that the proposed procedure performs reliably in regimes where directed spectral and score-based methods deteriorate. To the best of our knowledge, this provides the first exact recovery guarantee for this class of non-spectral, neighborhood-smoothing methods in the sparse, directed setting.
Related papers
- Regularized Online RLHF with Generalized Bilinear Preferences [68.44113000390544]
We consider the problem of contextual online RLHF with general preferences.<n>We adopt the Generalized Bilinear Preference Model to capture preferences via low-rank, skew-symmetric matrices.<n>We prove that the dual gap of the greedy policy is bounded by the square of the estimation error.
arXiv Detail & Related papers (2026-02-26T15:27:53Z) - Stability and Generalization of Push-Sum Based Decentralized Optimization over Directed Graphs [55.77845440440496]
Push-based decentralized communication enables optimization over communication networks, where information exchange may be asymmetric.<n>We develop a unified uniform-stability framework for the Gradient Push (SGP) algorithm.<n>A key technical ingredient is an imbalance-aware generalization bound through two quantities.
arXiv Detail & Related papers (2026-02-24T05:32:03Z) - Statistical-Geometric Degeneracy in UAV Search: A Physics-Aware Asymmetric Filtering Approach [23.49656058107753]
Post-disaster survivor localization using Unmanned Aerial Vehicles (UAVs) faces a fundamental physical challenge.<n>Unlike standard Gaussian noise, signal reflection from debris introduces strictly non-negative ranging biases.<n>Existing robust estimators, typically designed with symmetric loss functions, implicitly rely on the assumption of error symmetry.<n>We propose a physically-grounded solution, the AsymmetricHuberEKF, which explicitly incorporates the non-negative physical prior of NLOS biases.
arXiv Detail & Related papers (2026-02-11T08:33:56Z) - Nonparametric Kernel Clustering with Bandit Feedback [9.68728390492671]
Clustering with bandit feedback refers to the problem of partitioning a set of items, where the clustering algorithm can sequentially query items to receive noisy observations.<n>We introduce a kernel-based approach, which allows us to reform the nonparametric problem as the task of clustering the arms according to their kernel mean embeddings in a Hilbert kernel space (RKHS)<n>We introduce a notion of signal-to-noise ratio for this problem that depends on the maximum mean discrepancy (MMD) between the arm distributions and on their variance in the RKHS.
arXiv Detail & Related papers (2026-01-12T13:36:07Z) - Statistical Inference for Fuzzy Clustering [7.416766339318596]
Fuzzy $c$-means (FCM) allow mixed memberships and better capture uncertainty and gradual transitions.<n>We develop a new framework for weighted fuzzy $c$-means (WFCM) for settings with potential cluster size imbalance.
arXiv Detail & Related papers (2026-01-06T02:11:01Z) - Graph-based Clustering Revisited: A Relaxation of Kernel $k$-Means Perspective [73.18641268511318]
We propose a graph-based clustering algorithm that only relaxes the orthonormal constraint to derive clustering results.<n>To ensure a doubly constraint into a gradient, we transform the non-negative constraint into a class probability parameter.
arXiv Detail & Related papers (2025-09-23T09:14:39Z) - Gaussian Mixture Model with unknown diagonal covariances via continuous sparse regularization [0.6627152091494143]
We employ the Beurling-LASSO framework to simultaneously estimate the number of components and their parameters.<n>A key theoretical contribution is the identification of an explicit separation condition on mixture components.
arXiv Detail & Related papers (2025-09-16T09:42:44Z) - A Unified Theory of Stochastic Proximal Point Methods without Smoothness [52.30944052987393]
Proximal point methods have attracted considerable interest owing to their numerical stability and robustness against imperfect tuning.
This paper presents a comprehensive analysis of a broad range of variations of the proximal point method (SPPM)
arXiv Detail & Related papers (2024-05-24T21:09:19Z) - Random Forest Weighted Local Fréchet Regression with Random Objects [18.128663071848923]
We propose a novel random forest weighted local Fr'echet regression paradigm.<n>Our first method uses these weights as the local average to solve the conditional Fr'echet mean.<n>Second method performs local linear Fr'echet regression, both significantly improving existing Fr'echet regression methods.
arXiv Detail & Related papers (2022-02-10T09:10:59Z) - Exact Recovery in the General Hypergraph Stochastic Block Model [92.28929858529679]
This paper investigates fundamental limits of exact recovery in the general d-uniform hypergraph block model (d-HSBM)
We show that there exists a sharp threshold such that exact recovery is achievable above the threshold and impossible below it.
arXiv Detail & Related papers (2021-05-11T03:39:08Z) - Log-Likelihood Ratio Minimizing Flows: Towards Robust and Quantifiable
Neural Distribution Alignment [52.02794488304448]
We propose a new distribution alignment method based on a log-likelihood ratio statistic and normalizing flows.
We experimentally verify that minimizing the resulting objective results in domain alignment that preserves the local structure of input domains.
arXiv Detail & Related papers (2020-03-26T22:10:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.