Empirical Bayes PCA in high dimensions
- URL: http://arxiv.org/abs/2012.11676v1
- Date: Mon, 21 Dec 2020 20:43:44 GMT
- Title: Empirical Bayes PCA in high dimensions
- Authors: Xinyi Zhong and Chang Su and Zhou Fan
- Abstract summary: Principal Components Analysis is known to exhibit problematic phenomena of high-dimensional noise.
We propose an Empirical Bayes PCA method that reduces this noise by estimating a structural prior for the joint distributions of the principal components.
- Score: 11.806200054814772
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When the dimension of data is comparable to or larger than the number of
available data samples, Principal Components Analysis (PCA) is known to exhibit
problematic phenomena of high-dimensional noise. In this work, we propose an
Empirical Bayes PCA method that reduces this noise by estimating a structural
prior for the joint distributions of the principal components. This EB-PCA
method is based upon the classical Kiefer-Wolfowitz nonparametric MLE for
empirical Bayes estimation, distributional results derived from random matrix
theory for the sample PCs, and iterative refinement using an Approximate
Message Passing (AMP) algorithm. In theoretical "spiked" models, EB-PCA
achieves Bayes-optimal estimation accuracy in the same settings as the oracle
Bayes AMP procedure that knows the true priors. Empirically, EB-PCA can
substantially improve over PCA when there is strong prior structure, both in
simulation and on several quantitative benchmarks constructed using data from
the 1000 Genomes Project and the International HapMap Project. A final
illustration is presented for an analysis of gene expression data obtained by
single-cell RNA-seq.
Related papers
- Poisson Process for Bayesian Optimization [126.51200593377739]
We propose a ranking-based surrogate model based on the Poisson process and introduce an efficient BO framework, namely Poisson Process Bayesian Optimization (PoPBO)
Compared to the classic GP-BO method, our PoPBO has lower costs and better robustness to noise, which is verified by abundant experiments.
arXiv Detail & Related papers (2024-02-05T02:54:50Z) - Sparse PCA with Oracle Property [115.72363972222622]
We propose a family of estimators based on the semidefinite relaxation of sparse PCA with novel regularizations.
We prove that, another estimator within the family achieves a sharper statistical rate of convergence than the standard semidefinite relaxation of sparse PCA.
arXiv Detail & Related papers (2023-12-28T02:52:54Z) - HeMPPCAT: Mixtures of Probabilistic Principal Component Analysers for
Data with Heteroscedastic Noise [28.24679019484073]
MPPCA assumes the data samples in each mixture contain homoscedastic noise.
The performance of MPPCA is suboptimal for data with heteroscedastic noise across samples.
This paper proposes a heteroscedastic mixtures of probabilistic PCA technique (HeMPPCAT) that uses a generalized expectation-maximization (GEM) algorithm.
arXiv Detail & Related papers (2023-01-21T02:00:55Z) - Bayes-optimal limits in structured PCA, and how to reach them [21.3083877172595]
We study the paradigmatic matrix model of principal components analysis (PCA), where a rank-one matrix is corrupted by additive noise.
We provide the first characterization of the Bayes-optimal limits of inference in this model.
We propose a novel approximate message passing algorithm (AMP), inspired by the theory of Adaptive Thouless-Anderson-Palmer equations.
arXiv Detail & Related papers (2022-10-03T21:31:41Z) - Sparse high-dimensional linear regression with a partitioned empirical
Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression.
Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates.
The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z) - Robust Quantitative Susceptibility Mapping via Approximate Message
Passing with Parameter Estimation [14.22930572798757]
We propose a probabilistic Bayesian approach for quantitative susceptibility mapping (QSM) with built-in parameter estimation.
On the simulated Sim2Snr1 dataset, AMP-PE achieved the lowest NRMSE, DFCM and the highest SSIM.
On the in vivo datasets, AMP-PE is robust and successfully recovers the susceptibility maps using the estimated parameters.
arXiv Detail & Related papers (2022-07-29T14:38:03Z) - Probabilistic Conformal Prediction Using Conditional Random Samples [73.26753677005331]
PCP is a predictive inference algorithm that estimates a target variable by a discontinuous predictive set.
It is efficient and compatible with either explicit or implicit conditional generative models.
arXiv Detail & Related papers (2022-06-14T03:58:03Z) - PCA Initialization for Approximate Message Passing in Rotationally
Invariant Models [29.039655256171088]
Principal Component Analysis provides a natural estimator, and sharp results on its performance have been obtained in the high-dimensional regime.
Recently, an Approximate Message Passing (AMP) algorithm has been proposed as an alternative estimator with the potential to improve the accuracy of PCA.
In this work, we combine the two methods, initialize AMP with PCA, and propose a rigorous characterization of the performance of this estimator.
arXiv Detail & Related papers (2021-06-04T09:13:51Z) - Statistical Approach to Quantum Phase Estimation [62.92678804023415]
We introduce a new statistical and variational approach to the phase estimation algorithm (PEA)
Unlike the traditional and iterative PEAs which return only an eigenphase estimate, the proposed method can determine any unknown eigenstate-eigenphase pair.
We show the simulation results of the method with the Qiskit package on the IBM Q platform and on a local computer.
arXiv Detail & Related papers (2021-04-21T00:02:00Z) - Improved Dimensionality Reduction of various Datasets using Novel
Multiplicative Factoring Principal Component Analysis (MPCA) [0.0]
We present an improvement to the traditional PCA approach called Multiplicative factoring Principal Component Analysis.
The advantage of MPCA over the traditional PCA is that a penalty is imposed on the occurrence space through a multiplier to make negligible the effect of outliers in seeking out projections.
arXiv Detail & Related papers (2020-09-25T12:30:15Z) - Repulsive Mixture Models of Exponential Family PCA for Clustering [127.90219303669006]
The mixture extension of exponential family principal component analysis ( EPCA) was designed to encode much more structural information about data distribution than the traditional EPCA.
The traditional mixture of local EPCAs has the problem of model redundancy, i.e., overlaps among mixing components, which may cause ambiguity for data clustering.
In this paper, a repulsiveness-encouraging prior is introduced among mixing components and a diversified EPCA mixture (DEPCAM) model is developed in the Bayesian framework.
arXiv Detail & Related papers (2020-04-07T04:07:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.