Support Recovery in Sparse PCA with Non-Random Missing Data
- URL: http://arxiv.org/abs/2302.01535v1
- Date: Fri, 3 Feb 2023 04:20:25 GMT
- Title: Support Recovery in Sparse PCA with Non-Random Missing Data
- Authors: Hanbyul Lee, Qifan Song, Jean Honorio
- Abstract summary: We analyze a practical algorithm for sparse PCA on incomplete and noisy data under a general non-random sampling scheme.
We provide theoretical justification that under certain conditions, we can recover the support of the sparse leading eigenvector with high probability.
We show that our algorithm outperforms several other sparse PCA approaches especially when the observed entries have good structural properties.
- Score: 27.3669650952144
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We analyze a practical algorithm for sparse PCA on incomplete and noisy data
under a general non-random sampling scheme. The algorithm is based on a
semidefinite relaxation of the $\ell_1$-regularized PCA problem. We provide
theoretical justification that under certain conditions, we can recover the
support of the sparse leading eigenvector with high probability by obtaining a
unique solution. The conditions involve the spectral gap between the largest
and second-largest eigenvalues of the true data matrix, the magnitude of the
noise, and the structural properties of the observed entries. The concepts of
algebraic connectivity and irregularity are used to describe the structural
properties of the observed entries. We empirically justify our theorem with
synthetic and real data analysis. We also show that our algorithm outperforms
several other sparse PCA approaches especially when the observed entries have
good structural properties. As a by-product of our analysis, we provide two
theorems to handle a deterministic sampling scheme, which can be applied to
other matrix-related problems.
Related papers
- Sparse PCA with Oracle Property [115.72363972222622]
We propose a family of estimators based on the semidefinite relaxation of sparse PCA with novel regularizations.
We prove that, another estimator within the family achieves a sharper statistical rate of convergence than the standard semidefinite relaxation of sparse PCA.
arXiv Detail & Related papers (2023-12-28T02:52:54Z) - Bayes-optimal limits in structured PCA, and how to reach them [21.3083877172595]
We study the paradigmatic matrix model of principal components analysis (PCA), where a rank-one matrix is corrupted by additive noise.
We provide the first characterization of the Bayes-optimal limits of inference in this model.
We propose a novel approximate message passing algorithm (AMP), inspired by the theory of Adaptive Thouless-Anderson-Palmer equations.
arXiv Detail & Related papers (2022-10-03T21:31:41Z) - Exploring the Algorithm-Dependent Generalization of AUPRC Optimization
with List Stability [107.65337427333064]
optimization of the Area Under the Precision-Recall Curve (AUPRC) is a crucial problem for machine learning.
In this work, we present the first trial in the single-dependent generalization of AUPRC optimization.
Experiments on three image retrieval datasets on speak to the effectiveness and soundness of our framework.
arXiv Detail & Related papers (2022-09-27T09:06:37Z) - Support Recovery in Sparse PCA with Incomplete Data [27.3669650952144]
We use a practical algorithm for principal component analysis (PCA) incomplete data.
We provide theoretical and experimental evidence that the unknown true SDP enables exactly recover an incomplete support matrix.
We validate our theoretical results with incomplete data, and show encouraging meaningful results on a variance expression.
arXiv Detail & Related papers (2022-05-30T16:17:39Z) - Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample Efficiency [111.83670279016599]
We study reinforcement learning for partially observed decision processes (POMDPs) with infinite observation and state spaces.
We make the first attempt at partial observability and function approximation for a class of POMDPs with a linear structure.
arXiv Detail & Related papers (2022-04-20T21:15:38Z) - Partial Counterfactual Identification from Observational and
Experimental Data [83.798237968683]
We develop effective Monte Carlo algorithms to approximate the optimal bounds from an arbitrary combination of observational and experimental data.
Our algorithms are validated extensively on synthetic and real-world datasets.
arXiv Detail & Related papers (2021-10-12T02:21:30Z) - Robust Principal Component Analysis: A Median of Means Approach [17.446104539598895]
Principal Component Analysis is a tool for data visualization, denoising, and dimensionality reduction.
Recent supervised learning methods have shown great success in dealing with outlying observations.
This paper proposes a PCA procedure based on the MoM principle.
arXiv Detail & Related papers (2021-02-05T19:59:05Z) - A Linearly Convergent Algorithm for Distributed Principal Component
Analysis [12.91948651812873]
This paper introduces a feedforward neural network-based one time-scale distributed PCA algorithm termed Distributed Sanger's Algorithm (DSA)
The proposed algorithm is shown to converge linearly to a neighborhood of the true solution.
arXiv Detail & Related papers (2021-01-05T00:51:14Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Exact and Approximation Algorithms for Sparse PCA [1.7640556247739623]
This paper proposes two exact mixed-integer SDPs (MISDPs)
We then analyze the theoretical optimality gaps of their continuous relaxation values and prove that they are stronger than that of the state-of-art one.
Since off-the-shelf solvers, in general, have difficulty in solving MISDPs, we approximate SPCA with arbitrary accuracy by a mixed-integer linear program (MILP) of a similar size as MISDPs.
arXiv Detail & Related papers (2020-08-28T02:07:08Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.