Robust factored principal component analysis for matrix-valued outlier
accommodation and detection
- URL: http://arxiv.org/abs/2112.06760v1
- Date: Mon, 13 Dec 2021 16:12:22 GMT
- Title: Robust factored principal component analysis for matrix-valued outlier
accommodation and detection
- Authors: Xuan Ma, Jianhua Zhao, Yue Wang
- Abstract summary: Factored PCA (FPCA) is a probabilistic extension of PCA for matrix data.
We propose a robust extension of FPCA (RFPCA) for matrix data.
RFPCA can adaptively down-weight outliers and yield robust estimates.
- Score: 4.228971753938522
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Principal component analysis (PCA) is a popular dimension reduction technique
for vector data. Factored PCA (FPCA) is a probabilistic extension of PCA for
matrix data, which can substantially reduce the number of parameters in PCA
while yield satisfactory performance. However, FPCA is based on the Gaussian
assumption and thereby susceptible to outliers. Although the multivariate $t$
distribution as a robust modeling tool for vector data has a very long history,
its application to matrix data is very limited. The main reason is that the
dimension of the vectorized matrix data is often very high and the higher the
dimension, the lower the breakdown point that measures the robustness. To solve
the robustness problem suffered by FPCA and make it applicable to matrix data,
in this paper we propose a robust extension of FPCA (RFPCA), which is built
upon a $t$-type distribution called matrix-variate $t$ distribution. Like the
multivariate $t$ distribution, the matrix-variate $t$ distribution can
adaptively down-weight outliers and yield robust estimates. We develop a fast
EM-type algorithm for parameter estimation. Experiments on synthetic and
real-world datasets reveal that RFPCA is compared favorably with several
related methods and RFPCA is a simple but powerful tool for matrix-valued
outlier detection.
Related papers
- Robust bilinear factor analysis based on the matrix-variate $t$
distribution [2.6530267536011913]
Factor Analysis based on $t$ distribution ($t$fa) is useful for extracting common factors on heavy-tailed or contaminated data.
This paper proposes a novel robust factor analysis model, namely bilinear factor analysis built on $t$ distribution ($t$bfa)
It is capable of simultaneously extracting common factors for both row and column variables of interest on heavy-tailed or contaminated matrix data.
arXiv Detail & Related papers (2024-01-04T11:15:44Z) - Sparse PCA with Oracle Property [115.72363972222622]
We propose a family of estimators based on the semidefinite relaxation of sparse PCA with novel regularizations.
We prove that, another estimator within the family achieves a sharper statistical rate of convergence than the standard semidefinite relaxation of sparse PCA.
arXiv Detail & Related papers (2023-12-28T02:52:54Z) - Robust Principal Component Analysis using Density Power Divergence [8.057006406834466]
We introduce a novel robust PCA estimator based on the minimum density power divergence estimator.
Our theoretical findings are supported by extensive simulations and comparisons with existing robust PCA methods.
arXiv Detail & Related papers (2023-09-24T02:59:39Z) - Large-scale gradient-based training of Mixtures of Factor Analyzers [67.21722742907981]
This article contributes both a theoretical analysis as well as a new method for efficient high-dimensional training by gradient descent.
We prove that MFA training and inference/sampling can be performed based on precision matrices, which does not require matrix inversions after training is completed.
Besides the theoretical analysis and matrices, we apply MFA to typical image datasets such as SVHN and MNIST, and demonstrate the ability to perform sample generation and outlier detection.
arXiv Detail & Related papers (2023-08-26T06:12:33Z) - Improved Privacy-Preserving PCA Using Optimized Homomorphic Matrix
Multiplication [0.0]
Principal Component Analysis (PCA) is a pivotal technique widely utilized in the realms of machine learning and data analysis.
In recent years, there have been endeavors to utilize homomorphic encryption in privacy-preserving PCA algorithms for the secure cloud computing scenario.
We propose a novel approach to privacy-preserving PCA that addresses these limitations, resulting in superior efficiency, accuracy, and scalability compared to previous approaches.
arXiv Detail & Related papers (2023-05-27T02:51:20Z) - Nearly-Linear Time and Streaming Algorithms for Outlier-Robust PCA [43.106438224356175]
We develop a nearly-linear time algorithm for robust PCA with near-optimal error guarantees.
We also develop a single-pass streaming algorithm for robust PCA with memory usage nearly-linear in the dimension.
arXiv Detail & Related papers (2023-05-04T04:45:16Z) - Sparse high-dimensional linear regression with a partitioned empirical
Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression.
Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates.
The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z) - Unitary Approximate Message Passing for Matrix Factorization [90.84906091118084]
We consider matrix factorization (MF) with certain constraints, which finds wide applications in various areas.
We develop a Bayesian approach to MF with an efficient message passing implementation, called UAMPMF.
We show that UAMPMF significantly outperforms state-of-the-art algorithms in terms of recovery accuracy, robustness and computational complexity.
arXiv Detail & Related papers (2022-07-31T12:09:32Z) - A Linearly Convergent Algorithm for Distributed Principal Component
Analysis [12.91948651812873]
This paper introduces a feedforward neural network-based one time-scale distributed PCA algorithm termed Distributed Sanger's Algorithm (DSA)
The proposed algorithm is shown to converge linearly to a neighborhood of the true solution.
arXiv Detail & Related papers (2021-01-05T00:51:14Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.