Heteroskedastic Tensor Clustering
- URL: http://arxiv.org/abs/2311.02306v1
- Date: Sat, 4 Nov 2023 02:50:40 GMT
- Title: Heteroskedastic Tensor Clustering
- Authors: Yuchen Zhou, Yuxin Chen
- Abstract summary: We propose a two-stage method, named $mathsfHightext-orderHeteroClustering$ ($mathsfHHC$)
It starts by performing tensor subspace estimation via a novel spectral algorithm called $mathsfThresholdedDeflatedtext-HeteroPCA$, followed by approximate $k$-means to obtain cluster nodes.
Our algorithm provably achieves exact clustering as long as the SNR exceeds the computational limit.
- Score: 20.979358557219953
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Tensor clustering, which seeks to extract underlying cluster structures from
noisy tensor observations, has gained increasing attention. One extensively
studied model for tensor clustering is the tensor block model, which postulates
the existence of clustering structures along each mode and has found broad
applications in areas like multi-tissue gene expression analysis and multilayer
network analysis. However, currently available computationally feasible methods
for tensor clustering either are limited to handling i.i.d. sub-Gaussian noise
or suffer from suboptimal statistical performance, which restrains their
utility in applications that have to deal with heteroskedastic data and/or low
signal-to-noise-ratio (SNR).
To overcome these challenges, we propose a two-stage method, named
$\mathsf{High\text{-}order~HeteroClustering}$ ($\mathsf{HHC}$), which starts by
performing tensor subspace estimation via a novel spectral algorithm called
$\mathsf{Thresholded~Deflated\text{-}HeteroPCA}$, followed by approximate
$k$-means to obtain cluster nodes. Encouragingly, our algorithm provably
achieves exact clustering as long as the SNR exceeds the computational limit
(ignoring logarithmic factors); here, the SNR refers to the ratio of the
pairwise disparity between nodes to the noise level, and the computational
limit indicates the lowest SNR that enables exact clustering with polynomial
runtime. Comprehensive simulation and real-data experiments suggest that our
algorithm outperforms existing algorithms across various settings, delivering
more reliable clustering performance.
Related papers
- Consistency of Lloyd's Algorithm Under Perturbations [11.56433365648479]
Lloyd's algorithm is one of the most widely used clustering algorithms.
We show that the mis-clustering rate of Lloyd's algorithm on samples from a sub-Gaussian mixture is exponentially bounded after $O(log(n))$ iterations.
arXiv Detail & Related papers (2023-09-01T16:45:52Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Multi-View Clustering via Semi-non-negative Tensor Factorization [120.87318230985653]
We develop a novel multi-view clustering based on semi-non-negative tensor factorization (Semi-NTF)
Our model directly considers the between-view relationship and exploits the between-view complementary information.
In addition, we provide an optimization algorithm for the proposed method and prove mathematically that the algorithm always converges to the stationary KKT point.
arXiv Detail & Related papers (2023-03-29T14:54:19Z) - Subspace clustering in high-dimensions: Phase transitions \&
Statistical-to-Computational gap [24.073221004661427]
A simple model to study subspace clustering is the high-dimensional $k$-Gaussian mixture model.
We provide an exact characterization of the statistically optimal reconstruction error in this model in the high-dimensional regime with extensive sparsity.
arXiv Detail & Related papers (2022-05-26T17:47:35Z) - Lattice-Based Methods Surpass Sum-of-Squares in Clustering [98.46302040220395]
Clustering is a fundamental primitive in unsupervised learning.
Recent work has established lower bounds against the class of low-degree methods.
We show that, perhaps surprisingly, this particular clustering model textitdoes not exhibit a statistical-to-computational gap.
arXiv Detail & Related papers (2021-12-07T18:50:17Z) - Exact Clustering in Tensor Block Model: Statistical Optimality and
Computational Limit [10.8145995157397]
High-order clustering aims to identify heterogeneous substructure in multiway dataset.
Non- computation and nature of the problem poses significant challenges in both statistics and statistics.
arXiv Detail & Related papers (2020-12-18T00:48:27Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Spectral Clustering using Eigenspectrum Shape Based Nystrom Sampling [19.675277307158435]
This paper proposes a scalable Nystrom-based clustering algorithm with a new sampling procedure, Centroid Minimum Sum of Squared Similarities (CMS3), and a on when to use it.
Our datasets depends on the eigen spectrum shape of the dataset, and yields competitive low-rank approximations in test compared to the other state-of-the-art methods.
arXiv Detail & Related papers (2020-07-21T17:49:03Z) - Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA.
We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z) - Local Graph Clustering with Network Lasso [90.66817876491052]
We study the statistical and computational properties of a network Lasso method for local graph clustering.
The clusters delivered by nLasso can be characterized elegantly via network flows between cluster boundary and seed nodes.
arXiv Detail & Related papers (2020-04-25T17:52:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.