Tensor Clustering with Planted Structures: Statistical Optimality and
Computational Limits
- URL: http://arxiv.org/abs/2005.10743v4
- Date: Mon, 2 Oct 2023 18:27:58 GMT
- Title: Tensor Clustering with Planted Structures: Statistical Optimality and
Computational Limits
- Authors: Yuetian Luo and Anru R. Zhang
- Abstract summary: We focus on two clustering models, constant high-order clustering (CHC) and rank-one higher-order clustering (ROHC)
We identify the boundaries of signal-to-noise ratio for which CHC and ROHC detection/recovery are statistically possible.
We propose algorithms that achieve reliable detection and recovery when the signalto-noise ratio is above these thresholds.
- Score: 9.427635404752936
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: This paper studies the statistical and computational limits of high-order
clustering with planted structures. We focus on two clustering models, constant
high-order clustering (CHC) and rank-one higher-order clustering (ROHC), and
study the methods and theory for testing whether a cluster exists (detection)
and identifying the support of cluster (recovery).
Specifically, we identify the sharp boundaries of signal-to-noise ratio for
which CHC and ROHC detection/recovery are statistically possible. We also
develop the tight computational thresholds: when the signal-to-noise ratio is
below these thresholds, we prove that polynomial-time algorithms cannot solve
these problems under the computational hardness conjectures of hypergraphic
planted clique (HPC) detection and hypergraphic planted dense subgraph (HPDS)
recovery. We also propose polynomial-time tensor algorithms that achieve
reliable detection and recovery when the signal-to-noise ratio is above these
thresholds. Both sparsity and tensor structures yield the computational
barriers in high-order tensor clustering. The interplay between them results in
significant differences between high-order tensor clustering and matrix
clustering in literature in aspects of statistical and computational phase
transition diagrams, algorithmic approaches, hardness conjecture, and proof
techniques. To our best knowledge, we are the first to give a thorough
characterization of the statistical and computational trade-off for such a
double computational-barrier problem. Finally, we provide evidence for the
computational hardness conjectures of HPC detection (via low-degree polynomial
and Metropolis methods) and HPDS recovery (via low-degree polynomial method).
Related papers
- Tensor cumulants for statistical inference on invariant distributions [49.80012009682584]
We show that PCA becomes computationally hard at a critical value of the signal's magnitude.
We define a new set of objects, which provide an explicit, near-orthogonal basis for invariants of a given degree.
It also lets us analyze a new problem of distinguishing between different ensembles.
arXiv Detail & Related papers (2024-04-29T14:33:24Z) - Maximum Likelihood Estimation on Stochastic Blockmodels for Directed Graph Clustering [22.421702511126373]
We formulate clustering as estimating underlying communities in the directed block model.
We introduce two efficient and interpretable directed clustering algorithms, a spectral clustering algorithm and a semidefinite programming based clustering algorithm.
arXiv Detail & Related papers (2024-03-28T15:47:13Z) - Heteroskedastic Tensor Clustering [20.979358557219953]
We propose a two-stage method, named $mathsfHightext-orderHeteroClustering$ ($mathsfHHC$)
It starts by performing tensor subspace estimation via a novel spectral algorithm called $mathsfThresholdedDeflatedtext-HeteroPCA$, followed by approximate $k$-means to obtain cluster nodes.
Our algorithm provably achieves exact clustering as long as the SNR exceeds the computational limit.
arXiv Detail & Related papers (2023-11-04T02:50:40Z) - Multiway Spherical Clustering via Degree-Corrected Tensor Block Models [8.147652597876862]
We develop a degree-corrected block model with estimation accuracy guarantees.
In particular, we demonstrate that an intrinsic statistical-to-computational gap emerges only for tensors of order three or greater.
The efficacy of our procedure is demonstrated through two data applications.
arXiv Detail & Related papers (2022-01-19T03:40:22Z) - Lattice-Based Methods Surpass Sum-of-Squares in Clustering [98.46302040220395]
Clustering is a fundamental primitive in unsupervised learning.
Recent work has established lower bounds against the class of low-degree methods.
We show that, perhaps surprisingly, this particular clustering model textitdoes not exhibit a statistical-to-computational gap.
arXiv Detail & Related papers (2021-12-07T18:50:17Z) - Riemannian classification of EEG signals with missing values [67.90148548467762]
This paper proposes two strategies to handle missing data for the classification of electroencephalograms.
The first approach estimates the covariance from imputed data with the $k$-nearest neighbors algorithm; the second relies on the observed data by leveraging the observed-data likelihood within an expectation-maximization algorithm.
As results show, the proposed strategies perform better than the classification based on observed data and allow to keep a high accuracy even when the missing data ratio increases.
arXiv Detail & Related papers (2021-10-19T14:24:50Z) - Exact Clustering in Tensor Block Model: Statistical Optimality and
Computational Limit [10.8145995157397]
High-order clustering aims to identify heterogeneous substructure in multiway dataset.
Non- computation and nature of the problem poses significant challenges in both statistics and statistics.
arXiv Detail & Related papers (2020-12-18T00:48:27Z) - (k, l)-Medians Clustering of Trajectories Using Continuous Dynamic Time
Warping [57.316437798033974]
In this work we consider the problem of center-based clustering of trajectories.
We propose the usage of a continuous version of DTW as distance measure, which we call continuous dynamic time warping (CDTW)
We show a practical way to compute a center from a set of trajectories and subsequently iteratively improve it.
arXiv Detail & Related papers (2020-12-01T13:17:27Z) - Statistical and computational thresholds for the planted $k$-densest
sub-hypergraph problem [13.808053718325635]
We consider the problem of recovery a planted $k$-densest sub-hypergraph on $d$-uniform hypergraphs.
This fundamental problem appears in different contexts, e.g., community detection, average-case complexity, and neuroscience applications.
arXiv Detail & Related papers (2020-11-23T16:02:12Z) - Computational Barriers to Estimation from Low-Degree Polynomials [81.67886161671379]
We study the power of low-degrees for the task of detecting the presence of hidden structures.
For a large class of "signal plus noise" problems, we give a user-friendly lower bound for the best possible mean squared error achievable by any degree.
As applications, we give a tight characterization of the low-degree minimum mean squared error for the planted submatrix and planted dense subgraph problems.
arXiv Detail & Related papers (2020-08-05T17:52:10Z) - Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA.
We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.