An $\ell_p$ theory of PCA and spectral clustering
- URL: http://arxiv.org/abs/2006.14062v3
- Date: Sun, 10 Apr 2022 02:45:31 GMT
- Title: An $\ell_p$ theory of PCA and spectral clustering
- Authors: Emmanuel Abbe, Jianqing Fan, Kaizheng Wang
- Abstract summary: Principal Component Analysis is a powerful tool in statistics and machine learning.
In this paper, we develop an $ell_p$ theory for a hollowed version of PCA in Hilbert spaces.
For contextual community detection, the $ell_p$ theory leads to a simple spectral algorithm that achieves the information threshold for exact recovery.
- Score: 23.90245234027558
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Principal Component Analysis (PCA) is a powerful tool in statistics and
machine learning. While existing study of PCA focuses on the recovery of
principal components and their associated eigenvalues, there are few precise
characterizations of individual principal component scores that yield
low-dimensional embedding of samples. That hinders the analysis of various
spectral methods. In this paper, we first develop an $\ell_p$ perturbation
theory for a hollowed version of PCA in Hilbert spaces which provably improves
upon the vanilla PCA in the presence of heteroscedastic noises. Through a novel
$\ell_p$ analysis of eigenvectors, we investigate entrywise behaviors of
principal component score vectors and show that they can be approximated by
linear functionals of the Gram matrix in $\ell_p$ norm, which includes $\ell_2$
and $\ell_\infty$ as special examples. For sub-Gaussian mixture models, the
choice of $p$ giving optimal bounds depends on the signal-to-noise ratio, which
further yields optimality guarantees for spectral clustering. For contextual
community detection, the $\ell_p$ theory leads to a simple spectral algorithm
that achieves the information threshold for exact recovery. These also provide
optimal recovery results for Gaussian mixture and stochastic block models as
special cases.
Related papers
- Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure.
We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z) - Sparse PCA with Oracle Property [115.72363972222622]
We propose a family of estimators based on the semidefinite relaxation of sparse PCA with novel regularizations.
We prove that, another estimator within the family achieves a sharper statistical rate of convergence than the standard semidefinite relaxation of sparse PCA.
arXiv Detail & Related papers (2023-12-28T02:52:54Z) - A Tighter Analysis of Spectral Clustering, and Beyond [9.759415650727892]
This work studies the classical spectral clustering algorithm which embeds the vertices of some graph $G=(V_G, E_G)$ into $mathbbRk$ using $k$ eigenvectors of some matrix of $G$.
Our first result is a tighter analysis on the performance of spectral clustering, and explains why it works under some much weaker condition than the ones studied in the literature.
For the second result, we show that, by applying fewer than $k$ eigenvectors to construct the embedding, spectral clustering is able to produce better output for many
arXiv Detail & Related papers (2022-08-02T20:18:07Z) - Entrywise Recovery Guarantees for Sparse PCA via Sparsistent Algorithms [9.112172220055431]
We provide entrywise $ell_2,infty$ bounds for Sparse PCA under a general high-dimensional subgaussian design.
Our results hold for any algorithm that selects the correct support with high probability, those that are sparsistent.
arXiv Detail & Related papers (2022-02-08T18:50:35Z) - When Random Tensors meet Random Matrices [50.568841545067144]
This paper studies asymmetric order-$d$ spiked tensor models with Gaussian noise.
We show that the analysis of the considered model boils down to the analysis of an equivalent spiked symmetric textitblock-wise random matrix.
arXiv Detail & Related papers (2021-12-23T04:05:01Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - Linear-Sample Learning of Low-Rank Distributions [56.59844655107251]
We show that learning $ktimes k$, rank-$r$, matrices to normalized $L_1$ distance requires $Omega(frackrepsilon2)$ samples.
We propose an algorithm that uses $cal O(frackrepsilon2log2fracepsilon)$ samples, a number linear in the high dimension, and nearly linear in the matrices, typically low, rank proofs.
arXiv Detail & Related papers (2020-09-30T19:10:32Z) - Analytic Characterization of the Hessian in Shallow ReLU Models: A Tale
of Symmetry [9.695960412426672]
We analytically characterize the Hessian at various families of spurious minima.
In particular, we prove that for $dge k$ standard Gaussian inputs: (a) of the $dk$ eigenvalues of the Hessian, $dk - O(d)$ concentrate near zero, (b) $Omega(d)$ of the eigenvalues grow linearly with $k$.
arXiv Detail & Related papers (2020-08-04T20:08:35Z) - A New Basis for Sparse Principal Component Analysis [5.258928416164104]
Previous versions of sparse principal component analysis presumed that the eigen-basis (a $p times k$ matrix) is approximately sparse.
We propose a method that presumes the $p times k$ matrix becomes approximately sparse after a $k times k$ rotation.
We show that for the same level of sparsity, the proposed sparse PCA method is more stable and can explain more variance compared to alternative methods.
arXiv Detail & Related papers (2020-07-01T16:32:22Z) - Robustly Learning any Clusterable Mixture of Gaussians [55.41573600814391]
We study the efficient learnability of high-dimensional Gaussian mixtures in the adversarial-robust setting.
We provide an algorithm that learns the components of an $epsilon$-corrupted $k$-mixture within information theoretically near-optimal error proofs of $tildeO(epsilon)$.
Our main technical contribution is a new robust identifiability proof clusters from a Gaussian mixture, which can be captured by the constant-degree Sum of Squares proof system.
arXiv Detail & Related papers (2020-05-13T16:44:12Z) - A Precise High-Dimensional Asymptotic Theory for Boosting and
Minimum-$\ell_1$-Norm Interpolated Classifiers [3.167685495996986]
This paper establishes a precise high-dimensional theory for boosting on separable data.
Under a class of statistical models, we provide an exact analysis of the universality error of boosting.
We also explicitly pin down the relation between the boosting test error and the optimal Bayes error.
arXiv Detail & Related papers (2020-02-05T00:24:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.