Maximum Likelihood Estimation on Stochastic Blockmodels for Directed Graph Clustering
- URL: http://arxiv.org/abs/2403.19516v1
- Date: Thu, 28 Mar 2024 15:47:13 GMT
- Title: Maximum Likelihood Estimation on Stochastic Blockmodels for Directed Graph Clustering
- Authors: Mihai Cucuringu, Xiaowen Dong, Ning Zhang,
- Abstract summary: We formulate clustering as estimating underlying communities in the directed block model.
We introduce two efficient and interpretable directed clustering algorithms, a spectral clustering algorithm and a semidefinite programming based clustering algorithm.
- Score: 22.421702511126373
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies the directed graph clustering problem through the lens of statistics, where we formulate clustering as estimating underlying communities in the directed stochastic block model (DSBM). We conduct the maximum likelihood estimation (MLE) on the DSBM and thereby ascertain the most probable community assignment given the observed graph structure. In addition to the statistical point of view, we further establish the equivalence between this MLE formulation and a novel flow optimization heuristic, which jointly considers two important directed graph statistics: edge density and edge orientation. Building on this new formulation of directed clustering, we introduce two efficient and interpretable directed clustering algorithms, a spectral clustering algorithm and a semidefinite programming based clustering algorithm. We provide a theoretical upper bound on the number of misclustered vertices of the spectral clustering algorithm using tools from matrix perturbation theory. We compare, both quantitatively and qualitatively, our proposed algorithms with existing directed clustering methods on both synthetic and real-world data, thus providing further ground to our theoretical contributions.
Related papers
- Regularized Projection Matrix Approximation with Applications to Community Detection [1.5033631151609534]
This paper introduces a regularized projection matrix approximation framework aimed at recovering cluster information from the affinity matrix.
We explore three distinct penalty functions addressing bounded, positive, and sparse scenarios, respectively, and derive the Alternating Direction Method of Multipliers (ADMM) algorithm to solve the problem.
arXiv Detail & Related papers (2024-05-26T15:18:22Z) - Fuzzy K-Means Clustering without Cluster Centroids [79.19713746387337]
Fuzzy K-Means clustering is a critical computation technique in unsupervised data analysis.
This paper proposes a novel Fuzzy K-Means clustering algorithm that entirely eliminates the reliance on cluster centroids.
arXiv Detail & Related papers (2024-04-07T12:25:03Z) - Distributed Linear Regression with Compositional Covariates [5.085889377571319]
We focus on the distributed sparse penalized linear log-contrast model in massive compositional data.
Two distributed optimization techniques are proposed for solving the two different constrained convex optimization problems.
In the decentralized topology, we introduce a distributed coordinate-wise descent algorithm for obtaining a communication-efficient regularized estimation.
arXiv Detail & Related papers (2023-10-21T11:09:37Z) - One-step Bipartite Graph Cut: A Normalized Formulation and Its
Application to Scalable Subspace Clustering [56.81492360414741]
We show how to enforce a one-step normalized cut for bipartite graphs, especially with linear-time complexity.
In this paper, we first characterize a novel one-step bipartite graph cut criterion with normalized constraints, and theoretically prove its equivalence to a trace problem.
We extend this cut criterion to a scalable subspace clustering approach, where adaptive anchor learning, bipartite graph learning, and one-step normalized bipartite graph partitioning are simultaneously modeled.
arXiv Detail & Related papers (2023-05-12T11:27:20Z) - Multi-View Clustering via Semi-non-negative Tensor Factorization [120.87318230985653]
We develop a novel multi-view clustering based on semi-non-negative tensor factorization (Semi-NTF)
Our model directly considers the between-view relationship and exploits the between-view complementary information.
In addition, we provide an optimization algorithm for the proposed method and prove mathematically that the algorithm always converges to the stationary KKT point.
arXiv Detail & Related papers (2023-03-29T14:54:19Z) - A Modular Framework for Centrality and Clustering in Complex Networks [0.6423239719448168]
In this paper, we study two important such network analysis techniques, namely, centrality and clustering.
An information-flow based model is adopted for clustering, which itself builds upon an information theoretic measure for computing centrality.
Our clustering naturally inherits the flexibility to accommodate edge directionality, as well as different interpretations and interplay between edge weights and node degrees.
arXiv Detail & Related papers (2021-11-23T03:01:29Z) - Spatially Coherent Clustering Based on Orthogonal Nonnegative Matrix
Factorization [0.0]
We introduce in this work clustering models based on a total variation (TV) regularization procedure on the cluster membership matrix.
We provide a numerical evaluation of all proposed methods on a hyperspectral dataset obtained from a matrix-assisted laser desorption/ionisation imaging measurement.
arXiv Detail & Related papers (2021-04-25T23:40:41Z) - Joint Network Topology Inference via Structured Fusion Regularization [70.30364652829164]
Joint network topology inference represents a canonical problem of learning multiple graph Laplacian matrices from heterogeneous graph signals.
We propose a general graph estimator based on a novel structured fusion regularization.
We show that the proposed graph estimator enjoys both high computational efficiency and rigorous theoretical guarantee.
arXiv Detail & Related papers (2021-03-05T04:42:32Z) - Exact Clustering in Tensor Block Model: Statistical Optimality and
Computational Limit [10.8145995157397]
High-order clustering aims to identify heterogeneous substructure in multiway dataset.
Non- computation and nature of the problem poses significant challenges in both statistics and statistics.
arXiv Detail & Related papers (2020-12-18T00:48:27Z) - Clustering Ensemble Meets Low-rank Tensor Approximation [50.21581880045667]
This paper explores the problem of clustering ensemble, which aims to combine multiple base clusterings to produce better performance than that of the individual one.
We propose a novel low-rank tensor approximation-based method to solve the problem from a global perspective.
Experimental results over 7 benchmark data sets show that the proposed model achieves a breakthrough in clustering performance, compared with 12 state-of-the-art methods.
arXiv Detail & Related papers (2020-12-16T13:01:37Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.