Support Recovery in Sparse PCA with Incomplete Data
- URL: http://arxiv.org/abs/2205.15215v1
- Date: Mon, 30 May 2022 16:17:39 GMT
- Title: Support Recovery in Sparse PCA with Incomplete Data
- Authors: Hanbyul Lee, Qifan Song, Jean Honorio
- Abstract summary: We use a practical algorithm for principal component analysis (PCA) incomplete data.
We provide theoretical and experimental evidence that the unknown true SDP enables exactly recover an incomplete support matrix.
We validate our theoretical results with incomplete data, and show encouraging meaningful results on a variance expression.
- Score: 27.3669650952144
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study a practical algorithm for sparse principal component analysis (PCA)
of incomplete and noisy data. Our algorithm is based on the semidefinite
program (SDP) relaxation of the non-convex $l_1$-regularized PCA problem. We
provide theoretical and experimental evidence that SDP enables us to exactly
recover the true support of the sparse leading eigenvector of the unknown true
matrix, despite only observing an incomplete (missing uniformly at random) and
noisy version of it. We derive sufficient conditions for exact recovery, which
involve matrix incoherence, the spectral gap between the largest and
second-largest eigenvalues, the observation probability and the noise variance.
We validate our theoretical results with incomplete synthetic data, and show
encouraging and meaningful results on a gene expression dataset.
Related papers
- Entrywise error bounds for low-rank approximations of kernel matrices [55.524284152242096]
We derive entrywise error bounds for low-rank approximations of kernel matrices obtained using the truncated eigen-decomposition.
A key technical innovation is a delocalisation result for the eigenvectors of the kernel matrix corresponding to small eigenvalues.
We validate our theory with an empirical study of a collection of synthetic and real-world datasets.
arXiv Detail & Related papers (2024-05-23T12:26:25Z) - Sparse PCA with Oracle Property [115.72363972222622]
We propose a family of estimators based on the semidefinite relaxation of sparse PCA with novel regularizations.
We prove that, another estimator within the family achieves a sharper statistical rate of convergence than the standard semidefinite relaxation of sparse PCA.
arXiv Detail & Related papers (2023-12-28T02:52:54Z) - Support Recovery in Sparse PCA with Non-Random Missing Data [27.3669650952144]
We analyze a practical algorithm for sparse PCA on incomplete and noisy data under a general non-random sampling scheme.
We provide theoretical justification that under certain conditions, we can recover the support of the sparse leading eigenvector with high probability.
We show that our algorithm outperforms several other sparse PCA approaches especially when the observed entries have good structural properties.
arXiv Detail & Related papers (2023-02-03T04:20:25Z) - Bayes-optimal limits in structured PCA, and how to reach them [21.3083877172595]
We study the paradigmatic matrix model of principal components analysis (PCA), where a rank-one matrix is corrupted by additive noise.
We provide the first characterization of the Bayes-optimal limits of inference in this model.
We propose a novel approximate message passing algorithm (AMP), inspired by the theory of Adaptive Thouless-Anderson-Palmer equations.
arXiv Detail & Related papers (2022-10-03T21:31:41Z) - Partial Counterfactual Identification from Observational and
Experimental Data [83.798237968683]
We develop effective Monte Carlo algorithms to approximate the optimal bounds from an arbitrary combination of observational and experimental data.
Our algorithms are validated extensively on synthetic and real-world datasets.
arXiv Detail & Related papers (2021-10-12T02:21:30Z) - Robust Principal Component Analysis: A Median of Means Approach [17.446104539598895]
Principal Component Analysis is a tool for data visualization, denoising, and dimensionality reduction.
Recent supervised learning methods have shown great success in dealing with outlying observations.
This paper proposes a PCA procedure based on the MoM principle.
arXiv Detail & Related papers (2021-02-05T19:59:05Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Effective Data-aware Covariance Estimator from Compressed Data [63.16042585506435]
We propose a data-aware weighted sampling based covariance matrix estimator, namely DACE, which can provide an unbiased covariance matrix estimation.
We conduct extensive experiments on both synthetic and real-world datasets to demonstrate the superior performance of our DACE.
arXiv Detail & Related papers (2020-10-10T10:10:28Z) - Exact and Approximation Algorithms for Sparse PCA [1.7640556247739623]
This paper proposes two exact mixed-integer SDPs (MISDPs)
We then analyze the theoretical optimality gaps of their continuous relaxation values and prove that they are stronger than that of the state-of-art one.
Since off-the-shelf solvers, in general, have difficulty in solving MISDPs, we approximate SPCA with arbitrary accuracy by a mixed-integer linear program (MILP) of a similar size as MISDPs.
arXiv Detail & Related papers (2020-08-28T02:07:08Z) - Robust Compressed Sensing using Generative Models [98.64228459705859]
In this paper we propose an algorithm inspired by the Median-of-Means (MOM)
Our algorithm guarantees recovery for heavy-tailed data, even in the presence of outliers.
arXiv Detail & Related papers (2020-06-16T19:07:41Z) - Bridging Convex and Nonconvex Optimization in Robust PCA: Noise,
Outliers, and Missing Data [20.31071912733189]
This paper delivers improved theoretical guarantees for the convex programming approach in low-rank matrix estimation.
We show that a principled convex program achieves near-optimal statistical accuracy, in terms of both the Euclidean loss and the $ell_infty$ loss.
arXiv Detail & Related papers (2020-01-15T18:54:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.