Selecting the Number of Communities for Weighted Degree-Corrected Stochastic Block Models
- URL: http://arxiv.org/abs/2406.05340v2
- Date: Tue, 08 Oct 2024 06:01:43 GMT
- Title: Selecting the Number of Communities for Weighted Degree-Corrected Stochastic Block Models
- Authors: Yucheng Liu, Xiaodong Li,
- Abstract summary: We investigate how to select the number of communities for weighted networks without a full likelihood modeling.
We propose a novel weighted degree-corrected block model (DCSBM), in which the mean adjacency matrix is modeled as the same as in standard DCSBM.
Our method of selecting the number of communities is based on a sequential testing framework, and in each step the weighted DCSBM is fitted via some spectral clustering method.
- Score: 5.117940794592611
- License:
- Abstract: We investigate how to select the number of communities for weighted networks without a full likelihood modeling. First, we propose a novel weighted degree-corrected stochastic block model (DCSBM), in which the mean adjacency matrix is modeled as the same as in standard DCSBM, while the variance profile matrix is assumed to be related to the mean adjacency matrix through a given variance function. Our method of selecting the number of communities is based on a sequential testing framework, and in each step the weighted DCSBM is fitted via some spectral clustering method. A key step is to carry out matrix scaling on the estimated variance profile matrix. The resulting scaling factors can be used to normalize the adjacency matrix, from which the testing statistic is obtained. Under mild conditions on the weighted DCSBM, our proposed procedure is shown to be consistent in estimating the true number of communities. Numerical experiments on both simulated and real-world network data also demonstrate the desirable empirical properties of our method.
Related papers
- Online Variational Sequential Monte Carlo [49.97673761305336]
We build upon the variational sequential Monte Carlo (VSMC) method, which provides computationally efficient and accurate model parameter estimation and Bayesian latent-state inference.
Online VSMC is capable of performing efficiently, entirely on-the-fly, both parameter estimation and particle proposal adaptation.
arXiv Detail & Related papers (2023-12-19T21:45:38Z) - Large-scale gradient-based training of Mixtures of Factor Analyzers [67.21722742907981]
This article contributes both a theoretical analysis as well as a new method for efficient high-dimensional training by gradient descent.
We prove that MFA training and inference/sampling can be performed based on precision matrices, which does not require matrix inversions after training is completed.
Besides the theoretical analysis and matrices, we apply MFA to typical image datasets such as SVHN and MNIST, and demonstrate the ability to perform sample generation and outlier detection.
arXiv Detail & Related papers (2023-08-26T06:12:33Z) - Classification of BCI-EEG based on augmented covariance matrix [0.0]
We propose a new framework based on the augmented covariance extracted from an autoregressive model to improve motor imagery classification.
We will test our approach on several datasets and several subjects using the MOABB framework.
arXiv Detail & Related papers (2023-02-09T09:04:25Z) - Learning Graphical Factor Models with Riemannian Optimization [70.13748170371889]
This paper proposes a flexible algorithmic framework for graph learning under low-rank structural constraints.
The problem is expressed as penalized maximum likelihood estimation of an elliptical distribution.
We leverage geometries of positive definite matrices and positive semi-definite matrices of fixed rank that are well suited to elliptical models.
arXiv Detail & Related papers (2022-10-21T13:19:45Z) - Simplex Clustering via sBeta with Applications to Online Adjustment of Black-Box Predictions [16.876111500144667]
We introduce a novel probabilistic clustering method, referred to as k-sBetas.
We provide a general maximum a posteriori (MAP) perspective of clustering distributions.
Our code and comparisons with the existing simplex-clustering approaches and our introduced softmax-prediction benchmarks are publicly available.
arXiv Detail & Related papers (2022-07-30T18:29:11Z) - A Quadrature Rule combining Control Variates and Adaptive Importance
Sampling [0.0]
We show that a simple weighted least squares approach can be used to improve the accuracy of Monte Carlo integration estimates.
Our main result is a non-asymptotic bound on the probabilistic error of the procedure.
The good behavior of the method is illustrated empirically on synthetic examples and real-world data for Bayesian linear regression.
arXiv Detail & Related papers (2022-05-24T08:21:45Z) - Test Set Sizing Via Random Matrix Theory [91.3755431537592]
This paper uses techniques from Random Matrix Theory to find the ideal training-testing data split for a simple linear regression.
It defines "ideal" as satisfying the integrity metric, i.e. the empirical model error is the actual measurement noise.
This paper is the first to solve for the training and test size for any model in a way that is truly optimal.
arXiv Detail & Related papers (2021-12-11T13:18:33Z) - Community Detection in the Stochastic Block Model by Mixed Integer
Programming [3.8073142980733]
Degree-Corrected Block Model (DCSBM) is a popular model to generate random graphs with community structure given an expected degree sequence.
Standard approach of community detection based on the DCSBM is to search for the model parameters that are the most likely to have produced the observed network data through maximum likelihood estimation (MLE)
We present mathematical programming formulations and exact solution methods that can provably find the model parameters and community assignments of maximum likelihood given an observed graph.
arXiv Detail & Related papers (2021-01-26T22:04:40Z) - Kernel learning approaches for summarising and combining posterior
similarity matrices [68.8204255655161]
We build upon the notion of the posterior similarity matrix (PSM) in order to suggest new approaches for summarising the output of MCMC algorithms for Bayesian clustering models.
A key contribution of our work is the observation that PSMs are positive semi-definite, and hence can be used to define probabilistically-motivated kernel matrices.
arXiv Detail & Related papers (2020-09-27T14:16:14Z) - Sparse Covariance Estimation in Logit Mixture Models [0.0]
This paper introduces a new data-driven methodology for estimating sparse covariance matrices of the random coefficients in logit mixture models.
Our objective is to find optimal subsets of correlated coefficients for which we estimate covariances.
arXiv Detail & Related papers (2020-01-14T20:19:15Z) - Adaptive Correlated Monte Carlo for Contextual Categorical Sequence
Generation [77.7420231319632]
We adapt contextual generation of categorical sequences to a policy gradient estimator, which evaluates a set of correlated Monte Carlo (MC) rollouts for variance control.
We also demonstrate the use of correlated MC rollouts for binary-tree softmax models, which reduce the high generation cost in large vocabulary scenarios.
arXiv Detail & Related papers (2019-12-31T03:01:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.