Optimal Clustering of Discrete Mixtures: Binomial, Poisson, Block
Models, and Multi-layer Networks
- URL: http://arxiv.org/abs/2311.15598v1
- Date: Mon, 27 Nov 2023 07:48:50 GMT
- Title: Optimal Clustering of Discrete Mixtures: Binomial, Poisson, Block
Models, and Multi-layer Networks
- Authors: Zhongyuan Lyu, Ting Li, Dong Xia
- Abstract summary: We study the fundamental limit of clustering networks when a multi-layer network is present.
Under the mixture multi-layer block model (MMSBM), we show that the minimax optimal network clustering error rate takes an exponential form.
We propose a novel two-stage network clustering method including a tensor-based algorithm involving both node and sample splitting.
- Score: 9.57586103097079
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we first study the fundamental limit of clustering networks
when a multi-layer network is present. Under the mixture multi-layer stochastic
block model (MMSBM), we show that the minimax optimal network clustering error
rate, which takes an exponential form and is characterized by the Renyi
divergence between the edge probability distributions of the component
networks. We propose a novel two-stage network clustering method including a
tensor-based initialization algorithm involving both node and sample splitting
and a refinement procedure by likelihood-based Lloyd algorithm. Network
clustering must be accompanied by node community detection. Our proposed
algorithm achieves the minimax optimal network clustering error rate and allows
extreme network sparsity under MMSBM. Numerical simulations and real data
experiments both validate that our method outperforms existing methods.
Oftentimes, the edges of networks carry count-type weights. We then extend our
methodology and analysis framework to study the minimax optimal clustering
error rate for mixture of discrete distributions including Binomial, Poisson,
and multi-layer Poisson networks. The minimax optimal clustering error rates in
these discrete mixtures all take the same exponential form characterized by the
Renyi divergences. These optimal clustering error rates in discrete mixtures
can also be achieved by our proposed two-stage clustering algorithm.
Related papers
- Rethinking Clustered Federated Learning in NOMA Enhanced Wireless
Networks [60.09912912343705]
This study explores the benefits of integrating the novel clustered federated learning (CFL) approach with non-independent and identically distributed (non-IID) datasets.
A detailed theoretical analysis of the generalization gap that measures the degree of non-IID in the data distribution is presented.
Solutions to address the challenges posed by non-IID conditions are proposed with the analysis of the properties.
arXiv Detail & Related papers (2024-03-05T17:49:09Z) - Universal Lower Bounds and Optimal Rates: Achieving Minimax Clustering Error in Sub-Exponential Mixture Models [8.097200145973389]
We first establish a universal lower bound for the error rate in clustering any mixture model.
We then demonstrate that iterative algorithms attain this lower bound in mixture models with sub-exponential tails.
For datasets better modelled by Poisson or Negative Binomial mixtures, we study mixture models whose distributions belong to an exponential family.
In such mixtures, we establish that Bregman hard clustering, a variant of Lloyd's algorithm employing a Bregman divergence, is rate optimal.
arXiv Detail & Related papers (2024-02-23T16:51:17Z) - Optimization Guarantees of Unfolded ISTA and ADMM Networks With Smooth
Soft-Thresholding [57.71603937699949]
We study optimization guarantees, i.e., achieving near-zero training loss with the increase in the number of learning epochs.
We show that the threshold on the number of training samples increases with the increase in the network width.
arXiv Detail & Related papers (2023-09-12T13:03:47Z) - Layer Ensembles [95.42181254494287]
We introduce a method for uncertainty estimation that considers a set of independent categorical distributions for each layer of the network.
We show that the method can be further improved by ranking samples, resulting in models that require less memory and time to run.
arXiv Detail & Related papers (2022-10-10T17:52:47Z) - Unsupervised Clustered Federated Learning in Complex Multi-source
Acoustic Environments [75.8001929811943]
We introduce a realistic and challenging, multi-source and multi-room acoustic environment.
We present an improved clustering control strategy that takes into account the variability of the acoustic scene.
The proposed approach is optimized using clustering-based measures and validated via a network-wide classification task.
arXiv Detail & Related papers (2021-06-07T14:51:39Z) - ALMA: Alternating Minimization Algorithm for Clustering Mixture
Multilayer Network [20.888592224540748]
The goal is to partition the multilayer network into clusters of similar layers, and to identify communities in those layers.
The present paper proposes a different technique, an alternating minimization algorithm (ALMA) that aims at simultaneous recovery of the layer partition.
arXiv Detail & Related papers (2021-02-20T01:26:55Z) - Spectral clustering via adaptive layer aggregation for multi-layer
networks [6.0073653636512585]
We propose integrative spectral clustering approaches based on effective convex layer aggregations.
We show that our methods are remarkably competitive compared to several popularly used methods.
arXiv Detail & Related papers (2020-12-07T21:58:18Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z) - Local Graph Clustering with Network Lasso [90.66817876491052]
We study the statistical and computational properties of a network Lasso method for local graph clustering.
The clusters delivered by nLasso can be characterized elegantly via network flows between cluster boundary and seed nodes.
arXiv Detail & Related papers (2020-04-25T17:52:05Z) - Randomized spectral co-clustering for large-scale directed networks [15.486507430729052]
Co-clustering aims to cluster the senders and receivers of directed networks simultaneously.
We leverage sketching techniques and derive two randomized spectral co-clustering algorithms.
We establish their approximation error rates and misclustering error rates, indicating better bounds than the state-of-the-art results of co-clustering literature.
arXiv Detail & Related papers (2020-04-25T15:00:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.