Robust and Differentially Private PCA for non-Gaussian data
- URL: http://arxiv.org/abs/2507.15232v1
- Date: Mon, 21 Jul 2025 04:27:09 GMT
- Title: Robust and Differentially Private PCA for non-Gaussian data
- Authors: Minwoo Kim, Sungkyu Jung,
- Abstract summary: We propose a differentially private PCA method applicable to heavy-tailed and potentially contaminated data.<n>By applying a bounded transformation, we enable straightforward computation of principal components in a differentially private manner.<n>Our method consistently outperforms existing approaches in terms of statistical utility.
- Score: 3.744589644319257
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances have sparked significant interest in the development of privacy-preserving Principal Component Analysis (PCA). However, many existing approaches rely on restrictive assumptions, such as assuming sub-Gaussian data or being vulnerable to data contamination. Additionally, some methods are computationally expensive or depend on unknown model parameters that must be estimated, limiting their accessibility for data analysts seeking privacy-preserving PCA. In this paper, we propose a differentially private PCA method applicable to heavy-tailed and potentially contaminated data. Our approach leverages the property that the covariance matrix of properly rescaled data preserves eigenvectors and their order under elliptical distributions, which include Gaussian and heavy-tailed distributions. By applying a bounded transformation, we enable straightforward computation of principal components in a differentially private manner. Additionally, boundedness guarantees robustness against data contamination. We conduct both theoretical analysis and empirical evaluations of the proposed method, focusing on its ability to recover the subspace spanned by the leading principal components. Extensive numerical experiments demonstrate that our method consistently outperforms existing approaches in terms of statistical utility, particularly in non-Gaussian or contaminated data settings.
Related papers
- Assumption-Lean Post-Integrated Inference with Negative Control Outcomes [0.0]
We introduce a robust post-integrated inference (PII) method that adjusts for latent heterogeneity using negative control outcomes.
Our method extends to projected direct effect estimands, accounting for hidden mediators, confounders, and moderators.
The proposed doubly robust estimators are consistent and efficient under minimal assumptions and potential misspecification.
arXiv Detail & Related papers (2024-10-07T12:52:38Z) - Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation [62.2436697657307]
Prediction-powered inference (PPI) is a method that improves statistical estimates based on limited human-labeled data.<n>We propose a method called Stratified Prediction-Powered Inference (StratPPI)<n>We show that the basic PPI estimates can be considerably improved by employing simple data stratification strategies.
arXiv Detail & Related papers (2024-06-06T17:37:39Z) - Geometry-Aware Instrumental Variable Regression [56.16884466478886]
We propose a transport-based IV estimator that takes into account the geometry of the data manifold through data-derivative information.
We provide a simple plug-and-play implementation of our method that performs on par with related estimators in standard settings.
arXiv Detail & Related papers (2024-05-19T17:49:33Z) - Efficient Conformal Prediction under Data Heterogeneity [79.35418041861327]
Conformal Prediction (CP) stands out as a robust framework for uncertainty quantification.
Existing approaches for tackling non-exchangeability lead to methods that are not computable beyond the simplest examples.
This work introduces a new efficient approach to CP that produces provably valid confidence sets for fairly general non-exchangeable data distributions.
arXiv Detail & Related papers (2023-12-25T20:02:51Z) - Adaptive Differentially Quantized Subspace Perturbation (ADQSP): A Unified Framework for Privacy-Preserving Distributed Average Consensus [6.364764301218972]
We propose a general approach named adaptive differentially quantized subspace (ADQSP)
We show that by varying a single quantization parameter the proposed method can vary between SMPC-type performances and DP-type performances.
Our results show the potential of exploiting traditional distributed signal processing tools for providing cryptographic guarantees.
arXiv Detail & Related papers (2023-12-13T07:52:16Z) - On the Privacy-Robustness-Utility Trilemma in Distributed Learning [7.778461949427662]
We present the first tight analysis of the error incurred by any algorithm ensuring robustness against a fraction of adversarial machines.
Our analysis exhibits a fundamental trade-off between privacy, robustness, and utility.
arXiv Detail & Related papers (2023-02-09T17:24:18Z) - Stochastic and Private Nonconvex Outlier-Robust PCA [11.688030627514532]
Outlier-robust PCA seeks an underlying low-dimensional linear subspace from a dataset corrupted with outliers.
We show that our methods involve our methods, which involve a geodesic descent and a novel convergence analysis.
The main application method is an effectively private algorithm for outlier-robust PCA.
arXiv Detail & Related papers (2022-03-17T12:00:47Z) - Inference for Heteroskedastic PCA with Missing Data [16.54456304614719]
This paper shows how to construct confidence regions for principal component analysis (PCA) in high data sets.
We develop non-asymptotic distributional guarantees for HeteroPCA, and demonstrate how these can be invoked to compute both confidence regions for the principal subspace.
arXiv Detail & Related papers (2021-07-26T17:59:01Z) - Incorporating Causal Graphical Prior Knowledge into Predictive Modeling
via Simple Data Augmentation [92.96204497841032]
Causal graphs (CGs) are compact representations of the knowledge of the data generating processes behind the data distributions.
We propose a model-agnostic data augmentation method that allows us to exploit the prior knowledge of the conditional independence (CI) relations.
We experimentally show that the proposed method is effective in improving the prediction accuracy, especially in the small-data regime.
arXiv Detail & Related papers (2021-02-27T06:13:59Z) - Graph Embedding with Data Uncertainty [113.39838145450007]
spectral-based subspace learning is a common data preprocessing step in many machine learning pipelines.
Most subspace learning methods do not take into consideration possible measurement inaccuracies or artifacts that can lead to data with high uncertainty.
arXiv Detail & Related papers (2020-09-01T15:08:23Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.