Maximum Likelihood Estimation on Stochastic Blockmodels for Directed Graph Clustering
- URL: http://arxiv.org/abs/2403.19516v1
- Date: Thu, 28 Mar 2024 15:47:13 GMT
- Title: Maximum Likelihood Estimation on Stochastic Blockmodels for Directed Graph Clustering
- Authors: Mihai Cucuringu, Xiaowen Dong, Ning Zhang,
- Abstract summary: We formulate clustering as estimating underlying communities in the directed block model.
We introduce two efficient and interpretable directed clustering algorithms, a spectral clustering algorithm and a semidefinite programming based clustering algorithm.
- Score: 22.421702511126373
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies the directed graph clustering problem through the lens of statistics, where we formulate clustering as estimating underlying communities in the directed stochastic block model (DSBM). We conduct the maximum likelihood estimation (MLE) on the DSBM and thereby ascertain the most probable community assignment given the observed graph structure. In addition to the statistical point of view, we further establish the equivalence between this MLE formulation and a novel flow optimization heuristic, which jointly considers two important directed graph statistics: edge density and edge orientation. Building on this new formulation of directed clustering, we introduce two efficient and interpretable directed clustering algorithms, a spectral clustering algorithm and a semidefinite programming based clustering algorithm. We provide a theoretical upper bound on the number of misclustered vertices of the spectral clustering algorithm using tools from matrix perturbation theory. We compare, both quantitatively and qualitatively, our proposed algorithms with existing directed clustering methods on both synthetic and real-world data, thus providing further ground to our theoretical contributions.
Related papers
- Distributed Linear Regression with Compositional Covariates [5.085889377571319]
We focus on the distributed sparse penalized linear log-contrast model in massive compositional data.
Two distributed optimization techniques are proposed for solving the two different constrained convex optimization problems.
In the decentralized topology, we introduce a distributed coordinate-wise descent algorithm for obtaining a communication-efficient regularized estimation.
arXiv Detail & Related papers (2023-10-21T11:09:37Z) - One-step Bipartite Graph Cut: A Normalized Formulation and Its
Application to Scalable Subspace Clustering [56.81492360414741]
We show how to enforce a one-step normalized cut for bipartite graphs, especially with linear-time complexity.
In this paper, we first characterize a novel one-step bipartite graph cut criterion with normalized constraints, and theoretically prove its equivalence to a trace problem.
We extend this cut criterion to a scalable subspace clustering approach, where adaptive anchor learning, bipartite graph learning, and one-step normalized bipartite graph partitioning are simultaneously modeled.
arXiv Detail & Related papers (2023-05-12T11:27:20Z) - Multi-View Clustering via Semi-non-negative Tensor Factorization [120.87318230985653]
We develop a novel multi-view clustering based on semi-non-negative tensor factorization (Semi-NTF)
Our model directly considers the between-view relationship and exploits the between-view complementary information.
In addition, we provide an optimization algorithm for the proposed method and prove mathematically that the algorithm always converges to the stationary KKT point.
arXiv Detail & Related papers (2023-03-29T14:54:19Z) - Learning Graphical Factor Models with Riemannian Optimization [70.13748170371889]
This paper proposes a flexible algorithmic framework for graph learning under low-rank structural constraints.
The problem is expressed as penalized maximum likelihood estimation of an elliptical distribution.
We leverage geometries of positive definite matrices and positive semi-definite matrices of fixed rank that are well suited to elliptical models.
arXiv Detail & Related papers (2022-10-21T13:19:45Z) - Skew-Symmetric Adjacency Matrices for Clustering Directed Graphs [5.301300942803395]
Cut-based directed graph (digraph) clustering often focuses on finding dense within-cluster or sparse between-cluster connections.
For flow-based clusterings the edges between clusters tend to be oriented in one direction and have been found in migration data, food webs, and trade data.
arXiv Detail & Related papers (2022-03-02T20:07:04Z) - A Modular Framework for Centrality and Clustering in Complex Networks [0.6423239719448168]
In this paper, we study two important such network analysis techniques, namely, centrality and clustering.
An information-flow based model is adopted for clustering, which itself builds upon an information theoretic measure for computing centrality.
Our clustering naturally inherits the flexibility to accommodate edge directionality, as well as different interpretations and interplay between edge weights and node degrees.
arXiv Detail & Related papers (2021-11-23T03:01:29Z) - Spatially Coherent Clustering Based on Orthogonal Nonnegative Matrix
Factorization [0.0]
We introduce in this work clustering models based on a total variation (TV) regularization procedure on the cluster membership matrix.
We provide a numerical evaluation of all proposed methods on a hyperspectral dataset obtained from a matrix-assisted laser desorption/ionisation imaging measurement.
arXiv Detail & Related papers (2021-04-25T23:40:41Z) - Joint Network Topology Inference via Structured Fusion Regularization [70.30364652829164]
Joint network topology inference represents a canonical problem of learning multiple graph Laplacian matrices from heterogeneous graph signals.
We propose a general graph estimator based on a novel structured fusion regularization.
We show that the proposed graph estimator enjoys both high computational efficiency and rigorous theoretical guarantee.
arXiv Detail & Related papers (2021-03-05T04:42:32Z) - Exact Clustering in Tensor Block Model: Statistical Optimality and
Computational Limit [10.8145995157397]
High-order clustering aims to identify heterogeneous substructure in multiway dataset.
Non- computation and nature of the problem poses significant challenges in both statistics and statistics.
arXiv Detail & Related papers (2020-12-18T00:48:27Z) - Clustering Ensemble Meets Low-rank Tensor Approximation [50.21581880045667]
This paper explores the problem of clustering ensemble, which aims to combine multiple base clusterings to produce better performance than that of the individual one.
We propose a novel low-rank tensor approximation-based method to solve the problem from a global perspective.
Experimental results over 7 benchmark data sets show that the proposed model achieves a breakthrough in clustering performance, compared with 12 state-of-the-art methods.
arXiv Detail & Related papers (2020-12-16T13:01:37Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.