Subspace clustering in high-dimensions: Phase transitions \&
Statistical-to-Computational gap
- URL: http://arxiv.org/abs/2205.13527v1
- Date: Thu, 26 May 2022 17:47:35 GMT
- Title: Subspace clustering in high-dimensions: Phase transitions \&
Statistical-to-Computational gap
- Authors: Luca Pesce, Bruno Loureiro, Florent Krzakala, Lenka Zdeborov\'a
- Abstract summary: A simple model to study subspace clustering is the high-dimensional $k$-Gaussian mixture model.
We provide an exact characterization of the statistically optimal reconstruction error in this model in the high-dimensional regime with extensive sparsity.
- Score: 24.073221004661427
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A simple model to study subspace clustering is the high-dimensional
$k$-Gaussian mixture model where the cluster means are sparse vectors. Here we
provide an exact asymptotic characterization of the statistically optimal
reconstruction error in this model in the high-dimensional regime with
extensive sparsity, i.e. when the fraction of non-zero components of the
cluster means $\rho$, as well as the ratio $\alpha$ between the number of
samples and the dimension are fixed, while the dimension diverges. We identify
the information-theoretic threshold below which obtaining a positive
correlation with the true cluster means is statistically impossible.
Additionally, we investigate the performance of the approximate message passing
(AMP) algorithm analyzed via its state evolution, which is conjectured to be
optimal among polynomial algorithm for this task. We identify in particular the
existence of a statistical-to-computational gap between the algorithm that
require a signal-to-noise ratio $\lambda_{\text{alg}} \ge k / \sqrt{\alpha} $
to perform better than random, and the information theoretic threshold at
$\lambda_{\text{it}} \approx \sqrt{-k \rho \log{\rho}} / \sqrt{\alpha}$.
Finally, we discuss the case of sub-extensive sparsity $\rho$ by comparing the
performance of the AMP with other sparsity-enhancing algorithms, such as
sparse-PCA and diagonal thresholding.
Related papers
- Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure.
We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z) - Heteroskedastic Tensor Clustering [20.979358557219953]
We propose a two-stage method, named $mathsfHightext-orderHeteroClustering$ ($mathsfHHC$)
It starts by performing tensor subspace estimation via a novel spectral algorithm called $mathsfThresholdedDeflatedtext-HeteroPCA$, followed by approximate $k$-means to obtain cluster nodes.
Our algorithm provably achieves exact clustering as long as the SNR exceeds the computational limit.
arXiv Detail & Related papers (2023-11-04T02:50:40Z) - Optimal Algorithms for the Inhomogeneous Spiked Wigner Model [89.1371983413931]
We derive an approximate message-passing algorithm (AMP) for the inhomogeneous problem.
We identify in particular the existence of a statistical-to-computational gap where known algorithms require a signal-to-noise ratio bigger than the information-theoretic threshold to perform better than random.
arXiv Detail & Related papers (2023-02-13T19:57:17Z) - Optimal Clustering by Lloyd Algorithm for Low-Rank Mixture Model [12.868722327487752]
We propose a low-rank mixture model (LrMM) to treat matrix-valued observations.
A computationally efficient clustering method is designed by integrating Lloyd's algorithm and low-rank approximation.
Our method outperforms others in the literature on real-world datasets.
arXiv Detail & Related papers (2022-07-11T03:16:10Z) - Lattice-Based Methods Surpass Sum-of-Squares in Clustering [98.46302040220395]
Clustering is a fundamental primitive in unsupervised learning.
Recent work has established lower bounds against the class of low-degree methods.
We show that, perhaps surprisingly, this particular clustering model textitdoes not exhibit a statistical-to-computational gap.
arXiv Detail & Related papers (2021-12-07T18:50:17Z) - Spatially relaxed inference on high-dimensional linear models [48.989769153211995]
We study the properties of ensembled clustered inference algorithms which combine spatially constrained clustering, statistical inference, and ensembling to aggregate several clustered inference solutions.
We show that ensembled clustered inference algorithms control the $delta$-FWER under standard assumptions for $delta$ equal to the largest cluster diameter.
arXiv Detail & Related papers (2021-06-04T16:37:19Z) - Lattice partition recovery with dyadic CART [79.96359947166592]
We study piece-wise constant signals corrupted by additive Gaussian noise over a $d$-dimensional lattice.
Data of this form naturally arise in a host of applications, and the tasks of signal detection or testing, de-noising and estimation have been studied extensively in the statistical and signal processing literature.
In this paper we consider instead the problem of partition recovery, i.e.of estimating the partition of the lattice induced by the constancy regions of the unknown signal.
We prove that a DCART-based procedure consistently estimates the underlying partition at a rate of order $sigma2 k*
arXiv Detail & Related papers (2021-05-27T23:41:01Z) - Spectral Clustering using Eigenspectrum Shape Based Nystrom Sampling [19.675277307158435]
This paper proposes a scalable Nystrom-based clustering algorithm with a new sampling procedure, Centroid Minimum Sum of Squared Similarities (CMS3), and a on when to use it.
Our datasets depends on the eigen spectrum shape of the dataset, and yields competitive low-rank approximations in test compared to the other state-of-the-art methods.
arXiv Detail & Related papers (2020-07-21T17:49:03Z) - Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA.
We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.