Exact Clustering in Tensor Block Model: Statistical Optimality and
Computational Limit
- URL: http://arxiv.org/abs/2012.09996v2
- Date: Sun, 14 Mar 2021 14:55:54 GMT
- Title: Exact Clustering in Tensor Block Model: Statistical Optimality and
Computational Limit
- Authors: Rungang Han, Yuetian Luo, Miaoyan Wang, and Anru R. Zhang
- Abstract summary: High-order clustering aims to identify heterogeneous substructure in multiway dataset.
Non- computation and nature of the problem poses significant challenges in both statistics and statistics.
- Score: 10.8145995157397
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: High-order clustering aims to identify heterogeneous substructure in multiway
dataset that arises commonly in neuroimaging, genomics, and social network
studies. The non-convex and discontinuous nature of the problem poses
significant challenges in both statistics and computation. In this paper, we
propose a tensor block model and the computationally efficient methods,
\emph{high-order Lloyd algorithm} (HLloyd) and \emph{high-order spectral
clustering} (HSC), for high-order clustering in tensor block model. The
convergence of the proposed procedure is established, and we show that our
method achieves exact clustering under reasonable assumptions. We also give the
complete characterization for the statistical-computational trade-off in
high-order clustering based on three different signal-to-noise ratio regimes.
Finally, we show the merits of the proposed procedures via extensive
experiments on both synthetic and real datasets.
Related papers
- GCC: Generative Calibration Clustering [55.44944397168619]
We propose a novel Generative Clustering (GCC) method to incorporate feature learning and augmentation into clustering procedure.
First, we develop a discrimirative feature alignment mechanism to discover intrinsic relationship across real and generated samples.
Second, we design a self-supervised metric learning to generate more reliable cluster assignment.
arXiv Detail & Related papers (2024-04-14T01:51:11Z) - Maximum Likelihood Estimation on Stochastic Blockmodels for Directed Graph Clustering [22.421702511126373]
We formulate clustering as estimating underlying communities in the directed block model.
We introduce two efficient and interpretable directed clustering algorithms, a spectral clustering algorithm and a semidefinite programming based clustering algorithm.
arXiv Detail & Related papers (2024-03-28T15:47:13Z) - Heteroskedastic Tensor Clustering [20.979358557219953]
We propose a two-stage method, named $mathsfHightext-orderHeteroClustering$ ($mathsfHHC$)
It starts by performing tensor subspace estimation via a novel spectral algorithm called $mathsfThresholdedDeflatedtext-HeteroPCA$, followed by approximate $k$-means to obtain cluster nodes.
Our algorithm provably achieves exact clustering as long as the SNR exceeds the computational limit.
arXiv Detail & Related papers (2023-11-04T02:50:40Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Multi-View Clustering via Semi-non-negative Tensor Factorization [120.87318230985653]
We develop a novel multi-view clustering based on semi-non-negative tensor factorization (Semi-NTF)
Our model directly considers the between-view relationship and exploits the between-view complementary information.
In addition, we provide an optimization algorithm for the proposed method and prove mathematically that the algorithm always converges to the stationary KKT point.
arXiv Detail & Related papers (2023-03-29T14:54:19Z) - High-dimensional variable clustering based on maxima of a weakly dependent random process [1.1999555634662633]
We propose a new class of models for variable clustering called Asymptotic Independent block (AI-block) models.
This class of models is identifiable, meaning that there exists a maximal element with a partial order between partitions, allowing for statistical inference.
We also present an algorithm depending on a tuning parameter that recovers the clusters of variables without specifying the number of clusters empha priori.
arXiv Detail & Related papers (2023-02-02T08:24:26Z) - Unified Multi-View Orthonormal Non-Negative Graph Based Clustering
Framework [74.25493157757943]
We formulate a novel clustering model, which exploits the non-negative feature property and incorporates the multi-view information into a unified joint learning framework.
We also explore, for the first time, the multi-model non-negative graph-based approach to clustering data based on deep features.
arXiv Detail & Related papers (2022-11-03T08:18:27Z) - Multiway Spherical Clustering via Degree-Corrected Tensor Block Models [8.147652597876862]
We develop a degree-corrected block model with estimation accuracy guarantees.
In particular, we demonstrate that an intrinsic statistical-to-computational gap emerges only for tensors of order three or greater.
The efficacy of our procedure is demonstrated through two data applications.
arXiv Detail & Related papers (2022-01-19T03:40:22Z) - Lattice-Based Methods Surpass Sum-of-Squares in Clustering [98.46302040220395]
Clustering is a fundamental primitive in unsupervised learning.
Recent work has established lower bounds against the class of low-degree methods.
We show that, perhaps surprisingly, this particular clustering model textitdoes not exhibit a statistical-to-computational gap.
arXiv Detail & Related papers (2021-12-07T18:50:17Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Tensor Clustering with Planted Structures: Statistical Optimality and
Computational Limits [9.427635404752936]
We focus on two clustering models, constant high-order clustering (CHC) and rank-one higher-order clustering (ROHC)
We identify the boundaries of signal-to-noise ratio for which CHC and ROHC detection/recovery are statistically possible.
We propose algorithms that achieve reliable detection and recovery when the signalto-noise ratio is above these thresholds.
arXiv Detail & Related papers (2020-05-21T15:53:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.