ALPCAH: Sample-wise Heteroscedastic PCA with Tail Singular Value
Regularization
- URL: http://arxiv.org/abs/2307.02745v2
- Date: Sun, 12 Nov 2023 15:34:05 GMT
- Title: ALPCAH: Sample-wise Heteroscedastic PCA with Tail Singular Value
Regularization
- Authors: Javier Salazar Cavazos, Jeffrey A. Fessler, Laura Balzano
- Abstract summary: Principal component analysis is a key tool in the field of data dimensionality reduction.
This paper develops a PCA method that can estimate the sample-wise noise variances.
It is done without distributional assumptions of the low-rank component and without assuming the noise variances are known.
- Score: 17.771454131646312
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Principal component analysis (PCA) is a key tool in the field of data
dimensionality reduction that is useful for various data science problems.
However, many applications involve heterogeneous data that varies in quality
due to noise characteristics associated with different sources of the data.
Methods that deal with this mixed dataset are known as heteroscedastic methods.
Current methods like HePPCAT make Gaussian assumptions of the basis
coefficients that may not hold in practice. Other methods such as Weighted PCA
(WPCA) assume the noise variances are known, which may be difficult to know in
practice. This paper develops a PCA method that can estimate the sample-wise
noise variances and use this information in the model to improve the estimate
of the subspace basis associated with the low-rank structure of the data. This
is done without distributional assumptions of the low-rank component and
without assuming the noise variances are known. Simulations show the
effectiveness of accounting for such heteroscedasticity in the data, the
benefits of using such a method with all of the data versus retaining only good
data, and comparisons are made against other PCA methods established in the
literature like PCA, Robust PCA (RPCA), and HePPCAT. Code available at
https://github.com/javiersc1/ALPCAH
Related papers
- Empirical Bayes Covariance Decomposition, and a solution to the Multiple
Tuning Problem in Sparse PCA [2.5382095320488673]
Sparse Principal Components Analysis (PCA) has been proposed as a way to improve both interpretability and reliability of PCA.
We present a solution to the "multiple tuning problem" using Empirical Bayes methods.
arXiv Detail & Related papers (2023-12-06T04:00:42Z) - HeMPPCAT: Mixtures of Probabilistic Principal Component Analysers for
Data with Heteroscedastic Noise [28.24679019484073]
MPPCA assumes the data samples in each mixture contain homoscedastic noise.
The performance of MPPCA is suboptimal for data with heteroscedastic noise across samples.
This paper proposes a heteroscedastic mixtures of probabilistic PCA technique (HeMPPCAT) that uses a generalized expectation-maximization (GEM) algorithm.
arXiv Detail & Related papers (2023-01-21T02:00:55Z) - Capturing the Denoising Effect of PCA via Compression Ratio [3.967854215226183]
Principal component analysis (PCA) is one of the most fundamental tools in machine learning.
In this paper, we propose a novel metric called emphcompression ratio to capture the effect of PCA on high-dimensional noisy data.
Building on this new metric, we design a straightforward algorithm that could be used to detect outliers.
arXiv Detail & Related papers (2022-04-22T18:43:47Z) - Stochastic and Private Nonconvex Outlier-Robust PCA [11.688030627514532]
Outlier-robust PCA seeks an underlying low-dimensional linear subspace from a dataset corrupted with outliers.
We show that our methods involve our methods, which involve a geodesic descent and a novel convergence analysis.
The main application method is an effectively private algorithm for outlier-robust PCA.
arXiv Detail & Related papers (2022-03-17T12:00:47Z) - The Optimal Noise in Noise-Contrastive Learning Is Not What You Think [80.07065346699005]
We show that deviating from this assumption can actually lead to better statistical estimators.
In particular, the optimal noise distribution is different from the data's and even from a different family.
arXiv Detail & Related papers (2022-03-02T13:59:20Z) - Noise-Resistant Deep Metric Learning with Probabilistic Instance
Filtering [59.286567680389766]
Noisy labels are commonly found in real-world data, which cause performance degradation of deep neural networks.
We propose Probabilistic Ranking-based Instance Selection with Memory (PRISM) approach for DML.
PRISM calculates the probability of a label being clean, and filters out potentially noisy samples.
arXiv Detail & Related papers (2021-08-03T12:15:25Z) - Capturing patterns of variation unique to a specific dataset [68.8204255655161]
We propose a tuning-free method that identifies low-dimensional representations of a target dataset relative to one or more comparison datasets.
We show in several experiments that UCA with a single background dataset achieves similar results compared to cPCA with various tuning parameters.
arXiv Detail & Related papers (2021-04-16T15:07:32Z) - Adaptive Multi-View ICA: Estimation of noise levels for optimal
inference [65.94843987207445]
Adaptive multiView ICA (AVICA) is a noisy ICA model where each view is a linear mixture of shared independent sources with additive noise on the sources.
On synthetic data, AVICA yields better sources estimates than other group ICA methods thanks to its explicit MMSE estimator.
On real magnetoencephalograpy (MEG) data, we provide evidence that the decomposition is less sensitive to sampling noise and that the noise variance estimates are biologically plausible.
arXiv Detail & Related papers (2021-02-22T13:10:12Z) - Attentional-Biased Stochastic Gradient Descent [74.49926199036481]
We present a provable method (named ABSGD) for addressing the data imbalance or label noise problem in deep learning.
Our method is a simple modification to momentum SGD where we assign an individual importance weight to each sample in the mini-batch.
ABSGD is flexible enough to combine with other robust losses without any additional cost.
arXiv Detail & Related papers (2020-12-13T03:41:52Z) - Repulsive Mixture Models of Exponential Family PCA for Clustering [127.90219303669006]
The mixture extension of exponential family principal component analysis ( EPCA) was designed to encode much more structural information about data distribution than the traditional EPCA.
The traditional mixture of local EPCAs has the problem of model redundancy, i.e., overlaps among mixing components, which may cause ambiguity for data clustering.
In this paper, a repulsiveness-encouraging prior is introduced among mixing components and a diversified EPCA mixture (DEPCAM) model is developed in the Bayesian framework.
arXiv Detail & Related papers (2020-04-07T04:07:29Z) - Multi-class Gaussian Process Classification with Noisy Inputs [2.362412515574206]
In some situations, the amount of noise can be known before-hand.
We have evaluated the proposed methods by carrying out several experiments, involving synthetic and real data.
arXiv Detail & Related papers (2020-01-28T18:55:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.