Probabilistic Contrastive Principal Component Analysis
- URL: http://arxiv.org/abs/2012.07977v2
- Date: Fri, 30 Apr 2021 22:53:49 GMT
- Title: Probabilistic Contrastive Principal Component Analysis
- Authors: Didong Li, Andrew Jones and Barbara Engelhardt
- Abstract summary: We propose a model-based alternative to contrastive principal component analysis ( CPCA)
We show PCPCA's advantages over CPCA, including greater interpretability, uncertainty quantification and principled inference.
We demonstrate PCPCA's performance through a series of simulations and case-control experiments with datasets of gene expression, protein expression, and images.
- Score: 0.5286651840245514
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dimension reduction is useful for exploratory data analysis. In many
applications, it is of interest to discover variation that is enriched in a
"foreground" dataset relative to a "background" dataset. Recently, contrastive
principal component analysis (CPCA) was proposed for this setting. However, the
lack of a formal probabilistic model makes it difficult to reason about CPCA
and to tune its hyperparameter. In this work, we propose probabilistic
contrastive principal component analysis (PCPCA), a model-based alternative to
CPCA. We discuss how to set the hyperparameter in theory and in practice, and
we show several of PCPCA's advantages over CPCA, including greater
interpretability, uncertainty quantification and principled inference,
robustness to noise and missing data, and the ability to generate data from the
model. We demonstrate PCPCA's performance through a series of simulations and
case-control experiments with datasets of gene expression, protein expression,
and images.
Related papers
- ALPCAH: Sample-wise Heteroscedastic PCA with Tail Singular Value
Regularization [17.771454131646312]
Principal component analysis is a key tool in the field of data dimensionality reduction.
This paper develops a PCA method that can estimate the sample-wise noise variances.
It is done without distributional assumptions of the low-rank component and without assuming the noise variances are known.
arXiv Detail & Related papers (2023-07-06T03:11:11Z) - PLPCA: Persistent Laplacian Enhanced-PCA for Microarray Data Analysis [5.992724190105578]
We propose Persistent Laplacian-enhanced Principal Component Analysis (PLPCA)
PLPCA amalgamates the advantages of earlier regularized PCA methods with persistent spectral graph theory.
In contrast to graph Laplacians, persistent Laplacians enable multiscale analysis through filtration and incorporate higher-order simplicial complexes.
arXiv Detail & Related papers (2023-06-09T22:48:14Z) - An online algorithm for contrastive Principal Component Analysis [9.090031210111919]
We derive an online algorithm for cPCA* and show that it maps onto a neural network with local learning rules, so it can potentially be implemented in energy efficient neuromorphic hardware.
We evaluate the performance of our online algorithm on real datasets and highlight the differences and similarities with the original formulation.
arXiv Detail & Related papers (2022-11-14T19:48:48Z) - coVariance Neural Networks [119.45320143101381]
Graph neural networks (GNN) are an effective framework that exploit inter-relationships within graph-structured data for learning.
We propose a GNN architecture, called coVariance neural network (VNN), that operates on sample covariance matrices as graphs.
We show that VNN performance is indeed more stable than PCA-based statistical approaches.
arXiv Detail & Related papers (2022-05-31T15:04:43Z) - Multi-modality fusion using canonical correlation analysis methods:
Application in breast cancer survival prediction from histology and genomics [16.537929113715432]
We study the use of canonical correlation analysis (CCA) and penalized variants of CCA for the fusion of two modalities.
We analytically show that, with known model parameters, posterior mean estimators that jointly use both modalities outperform arbitrary linear mixing of single modality posterior estimators in latent variable prediction.
arXiv Detail & Related papers (2021-11-27T21:18:01Z) - Capturing patterns of variation unique to a specific dataset [68.8204255655161]
We propose a tuning-free method that identifies low-dimensional representations of a target dataset relative to one or more comparison datasets.
We show in several experiments that UCA with a single background dataset achieves similar results compared to cPCA with various tuning parameters.
arXiv Detail & Related papers (2021-04-16T15:07:32Z) - Enhanced Principal Component Analysis under A Collaborative-Robust
Framework [89.28334359066258]
We introduce a general collaborative-robust weight learning framework that combines weight learning and robust loss in a non-trivial way.
Under the proposed framework, only a part of well-fitting samples are activated which indicates more importance during training, and others, whose errors are large, will not be ignored.
In particular, the negative effects of inactivated samples are alleviated by the robust loss function.
arXiv Detail & Related papers (2021-03-22T15:17:37Z) - Probabilistic Generating Circuits [50.98473654244851]
We propose probabilistic generating circuits (PGCs) for their efficient representation.
PGCs are not just a theoretical framework that unifies vastly different existing models, but also show huge potential in modeling realistic data.
We exhibit a simple class of PGCs that are not trivially subsumed by simple combinations of PCs and DPPs, and obtain competitive performance on a suite of density estimation benchmarks.
arXiv Detail & Related papers (2021-02-19T07:06:53Z) - Supervised PCA: A Multiobjective Approach [70.99924195791532]
Methods for supervised principal component analysis (SPCA)
We propose a new method for SPCA that addresses both of these objectives jointly.
Our approach accommodates arbitrary supervised learning losses and, through a statistical reformulation, provides a novel low-rank extension of generalized linear models.
arXiv Detail & Related papers (2020-11-10T18:46:58Z) - Principal Ellipsoid Analysis (PEA): Efficient non-linear dimension
reduction & clustering [9.042239247913642]
This article focuses on improving upon PCA and k-means, by allowing nonlinear relations in the data and more flexible cluster shapes.
The key contribution is a new framework for Principal Analysis (PEA), defining a simple and computationally efficient alternative to PCA.
In a rich variety of real data clustering applications, PEA is shown to do as well as k-means for simple datasets, while dramatically improving performance in more complex settings.
arXiv Detail & Related papers (2020-08-17T06:25:50Z) - Repulsive Mixture Models of Exponential Family PCA for Clustering [127.90219303669006]
The mixture extension of exponential family principal component analysis ( EPCA) was designed to encode much more structural information about data distribution than the traditional EPCA.
The traditional mixture of local EPCAs has the problem of model redundancy, i.e., overlaps among mixing components, which may cause ambiguity for data clustering.
In this paper, a repulsiveness-encouraging prior is introduced among mixing components and a diversified EPCA mixture (DEPCAM) model is developed in the Bayesian framework.
arXiv Detail & Related papers (2020-04-07T04:07:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.