Robust Principal Component Analysis using Density Power Divergence
- URL: http://arxiv.org/abs/2309.13531v1
- Date: Sun, 24 Sep 2023 02:59:39 GMT
- Title: Robust Principal Component Analysis using Density Power Divergence
- Authors: Subhrajyoty Roy, Ayanendranath Basu and Abhik Ghosh
- Abstract summary: We introduce a novel robust PCA estimator based on the minimum density power divergence estimator.
Our theoretical findings are supported by extensive simulations and comparisons with existing robust PCA methods.
- Score: 8.057006406834466
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Principal component analysis (PCA) is a widely employed statistical tool used
primarily for dimensionality reduction. However, it is known to be adversely
affected by the presence of outlying observations in the sample, which is quite
common. Robust PCA methods using M-estimators have theoretical benefits, but
their robustness drop substantially for high dimensional data. On the other end
of the spectrum, robust PCA algorithms solving principal component pursuit or
similar optimization problems have high breakdown, but lack theoretical
richness and demand high computational power compared to the M-estimators. We
introduce a novel robust PCA estimator based on the minimum density power
divergence estimator. This combines the theoretical strength of the
M-estimators and the minimum divergence estimators with a high breakdown
guarantee regardless of data dimension. We present a computationally efficient
algorithm for this estimate. Our theoretical findings are supported by
extensive simulations and comparisons with existing robust PCA methods. We also
showcase the proposed algorithm's applicability on two benchmark datasets and a
credit card transactions dataset for fraud detection.
Related papers
- Learning-Augmented K-Means Clustering Using Dimensional Reduction [1.7243216387069678]
We propose a solution to reduce the dimensionality of the dataset using Principal Component Analysis (PCA)
PCA is well-established in the literature and has become one of the most useful tools for data modeling, compression, and visualization.
arXiv Detail & Related papers (2024-01-06T12:02:33Z) - Sparse PCA with Oracle Property [115.72363972222622]
We propose a family of estimators based on the semidefinite relaxation of sparse PCA with novel regularizations.
We prove that, another estimator within the family achieves a sharper statistical rate of convergence than the standard semidefinite relaxation of sparse PCA.
arXiv Detail & Related papers (2023-12-28T02:52:54Z) - Doubly Robust Instance-Reweighted Adversarial Training [107.40683655362285]
We propose a novel doubly-robust instance reweighted adversarial framework.
Our importance weights are obtained by optimizing the KL-divergence regularized loss function.
Our proposed approach outperforms related state-of-the-art baseline methods in terms of average robust performance.
arXiv Detail & Related papers (2023-08-01T06:16:18Z) - Nearly-Linear Time and Streaming Algorithms for Outlier-Robust PCA [43.106438224356175]
We develop a nearly-linear time algorithm for robust PCA with near-optimal error guarantees.
We also develop a single-pass streaming algorithm for robust PCA with memory usage nearly-linear in the dimension.
arXiv Detail & Related papers (2023-05-04T04:45:16Z) - Robust factored principal component analysis for matrix-valued outlier
accommodation and detection [4.228971753938522]
Factored PCA (FPCA) is a probabilistic extension of PCA for matrix data.
We propose a robust extension of FPCA (RFPCA) for matrix data.
RFPCA can adaptively down-weight outliers and yield robust estimates.
arXiv Detail & Related papers (2021-12-13T16:12:22Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Doing Great at Estimating CATE? On the Neglected Assumptions in
Benchmark Comparisons of Treatment Effect Estimators [91.3755431537592]
We show that even in arguably the simplest setting, estimation under ignorability assumptions can be misleading.
We consider two popular machine learning benchmark datasets for evaluation of heterogeneous treatment effect estimators.
We highlight that the inherent characteristics of the benchmark datasets favor some algorithms over others.
arXiv Detail & Related papers (2021-07-28T13:21:27Z) - Robust Principal Component Analysis: A Median of Means Approach [17.446104539598895]
Principal Component Analysis is a tool for data visualization, denoising, and dimensionality reduction.
Recent supervised learning methods have shown great success in dealing with outlying observations.
This paper proposes a PCA procedure based on the MoM principle.
arXiv Detail & Related papers (2021-02-05T19:59:05Z) - Modal Principal Component Analysis [3.050919759387985]
It has been shown that the robustness of many statistical methods can be improved using mode estimation instead of mean estimation.
This study proposes a modal principal component analysis (MPCA) which is a robust PCA method based on mode estimation.
arXiv Detail & Related papers (2020-08-07T23:59:05Z) - Approximation Algorithms for Sparse Principal Component Analysis [57.5357874512594]
Principal component analysis (PCA) is a widely used dimension reduction technique in machine learning and statistics.
Various approaches to obtain sparse principal direction loadings have been proposed, which are termed Sparse Principal Component Analysis.
We present thresholding as a provably accurate, time, approximation algorithm for the SPCA problem.
arXiv Detail & Related papers (2020-06-23T04:25:36Z) - $\gamma$-ABC: Outlier-Robust Approximate Bayesian Computation Based on a
Robust Divergence Estimator [95.71091446753414]
We propose to use a nearest-neighbor-based $gamma$-divergence estimator as a data discrepancy measure.
Our method achieves significantly higher robustness than existing discrepancy measures.
arXiv Detail & Related papers (2020-06-13T06:09:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.