Empirical Bayes Covariance Decomposition, and a solution to the Multiple
Tuning Problem in Sparse PCA
- URL: http://arxiv.org/abs/2312.03274v1
- Date: Wed, 6 Dec 2023 04:00:42 GMT
- Title: Empirical Bayes Covariance Decomposition, and a solution to the Multiple
Tuning Problem in Sparse PCA
- Authors: Joonsuk Kang, Matthew Stephens
- Abstract summary: Sparse Principal Components Analysis (PCA) has been proposed as a way to improve both interpretability and reliability of PCA.
We present a solution to the "multiple tuning problem" using Empirical Bayes methods.
- Score: 2.5382095320488673
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sparse Principal Components Analysis (PCA) has been proposed as a way to
improve both interpretability and reliability of PCA. However, use of sparse
PCA in practice is hindered by the difficulty of tuning the multiple
hyperparameters that control the sparsity of different PCs (the "multiple
tuning problem", MTP). Here we present a solution to the MTP using Empirical
Bayes methods. We first introduce a general formulation for penalized PCA of a
data matrix $\mathbf{X}$, which includes some existing sparse PCA methods as
special cases. We show that this formulation also leads to a penalized
decomposition of the covariance (or Gram) matrix, $\mathbf{X}^T\mathbf{X}$. We
introduce empirical Bayes versions of these penalized problems, in which the
penalties are determined by prior distributions that are estimated from the
data by maximum likelihood rather than cross-validation. The resulting
"Empirical Bayes Covariance Decomposition" provides a principled and efficient
solution to the MTP in sparse PCA, and one that can be immediately extended to
incorporate other structural assumptions (e.g. non-negative PCA). We illustrate
the effectiveness of this approach on both simulated and real data examples.
Related papers
- Unveiling the Statistical Foundations of Chain-of-Thought Prompting Methods [59.779795063072655]
Chain-of-Thought (CoT) prompting and its variants have gained popularity as effective methods for solving multi-step reasoning problems.
We analyze CoT prompting from a statistical estimation perspective, providing a comprehensive characterization of its sample complexity.
arXiv Detail & Related papers (2024-08-25T04:07:18Z) - Sparse PCA with Oracle Property [115.72363972222622]
We propose a family of estimators based on the semidefinite relaxation of sparse PCA with novel regularizations.
We prove that, another estimator within the family achieves a sharper statistical rate of convergence than the standard semidefinite relaxation of sparse PCA.
arXiv Detail & Related papers (2023-12-28T02:52:54Z) - ALPCAH: Sample-wise Heteroscedastic PCA with Tail Singular Value
Regularization [17.771454131646312]
Principal component analysis is a key tool in the field of data dimensionality reduction.
This paper develops a PCA method that can estimate the sample-wise noise variances.
It is done without distributional assumptions of the low-rank component and without assuming the noise variances are known.
arXiv Detail & Related papers (2023-07-06T03:11:11Z) - Asymptotically Unbiased Instance-wise Regularized Partial AUC
Optimization: Theory and Algorithm [101.44676036551537]
One-way Partial AUC (OPAUC) and Two-way Partial AUC (TPAUC) measures the average performance of a binary classifier.
Most of the existing methods could only optimize PAUC approximately, leading to inevitable biases that are not controllable.
We present a simpler reformulation of the PAUC problem via distributional robust optimization AUC.
arXiv Detail & Related papers (2022-10-08T08:26:22Z) - Sparse PCA With Multiple Components [2.5382095320488673]
Sparse Principal Component Analysis (sPCA) is a technique for obtaining combinations of features that explain variance of high-dimensional datasets in an interpretable manner.
Most existing PCA methods do not guarantee the optimality, let alone the optimality, of the resulting solution when we seek multiple sparse PCs.
We propose exact methods and rounding mechanisms that, together, obtain solutions with a bound gap on the order of 0%-15% for real-world datasets.
arXiv Detail & Related papers (2022-09-29T13:57:18Z) - A novel approach for Fair Principal Component Analysis based on
eigendecomposition [10.203602318836444]
We propose a novel PCA algorithm which tackles fairness issues by means of a simple strategy comprising a one-dimensional search.
Our findings are consistent in several real situations as well as in scenarios with both unbalanced and balanced datasets.
arXiv Detail & Related papers (2022-08-24T08:20:16Z) - Sparse PCA on fixed-rank matrices [0.05076419064097732]
We show that, if the rank of the covariance matrix is a fixed value, then there is an algorithm that solves sparse PCA to global optimality.
We also prove a similar result for the version of sparse PCA which requires the principal components to have disjoint supports.
arXiv Detail & Related papers (2022-01-07T15:05:32Z) - AgFlow: Fast Model Selection of Penalized PCA via Implicit
Regularization Effects of Gradient Flow [64.81110234990888]
Principal component analysis (PCA) has been widely used as an effective technique for feature extraction and dimension reduction.
In the High Dimension Low Sample Size (HDLSS) setting, one may prefer modified principal components, with penalized loadings.
We propose Approximated Gradient Flow (AgFlow) as a fast model selection method for penalized PCA.
arXiv Detail & Related papers (2021-10-07T08:57:46Z) - FAST-PCA: A Fast and Exact Algorithm for Distributed Principal Component
Analysis [12.91948651812873]
Principal Component Analysis (PCA) is a fundamental data preprocessing tool in the world of machine learning.
This paper proposes a distributed PCA algorithm called FAST-PCA (Fast and exAct diSTributed PCA)
arXiv Detail & Related papers (2021-08-27T16:10:59Z) - Enhanced Principal Component Analysis under A Collaborative-Robust
Framework [89.28334359066258]
We introduce a general collaborative-robust weight learning framework that combines weight learning and robust loss in a non-trivial way.
Under the proposed framework, only a part of well-fitting samples are activated which indicates more importance during training, and others, whose errors are large, will not be ignored.
In particular, the negative effects of inactivated samples are alleviated by the robust loss function.
arXiv Detail & Related papers (2021-03-22T15:17:37Z) - Stein Variational Model Predictive Control [130.60527864489168]
Decision making under uncertainty is critical to real-world, autonomous systems.
Model Predictive Control (MPC) methods have demonstrated favorable performance in practice, but remain limited when dealing with complex distributions.
We show that this framework leads to successful planning in challenging, non optimal control problems.
arXiv Detail & Related papers (2020-11-15T22:36:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.