Related papers: Inference for Heteroskedastic PCA with Missing Data

Inference for Heteroskedastic PCA with Missing Data

URL: http://arxiv.org/abs/2107.12365v2
Date: Wed, 28 Feb 2024 17:22:09 GMT
Title: Inference for Heteroskedastic PCA with Missing Data
Authors: Yuling Yan, Yuxin Chen, Jianqing Fan
Abstract summary: This paper shows how to construct confidence regions for principal component analysis (PCA) in high data sets. We develop non-asymptotic distributional guarantees for HeteroPCA, and demonstrate how these can be invoked to compute both confidence regions for the principal subspace.
Score: 16.54456304614719
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper studies how to construct confidence regions for principal component analysis (PCA) in high dimension, a problem that has been vastly under-explored. While computing measures of uncertainty for nonlinear/nonconvex estimators is in general difficult in high dimension, the challenge is further compounded by the prevalent presence of missing data and heteroskedastic noise. We propose a novel approach to performing valid inference on the principal subspace under a spiked covariance model with missing data, on the basis of an estimator called HeteroPCA (Zhang et al., 2022). We develop non-asymptotic distributional guarantees for HeteroPCA, and demonstrate how these can be invoked to compute both confidence regions for the principal subspace and entrywise confidence intervals for the spiked covariance matrix. Our inference procedures are fully data-driven and adaptive to heteroskedastic random noise, without requiring prior knowledge about the noise levels.

Related papers

Robust and Differentially Private PCA for non-Gaussian data [3.744589644319257]
We propose a differentially private PCA method applicable to heavy-tailed and potentially contaminated data.<n>By applying a bounded transformation, we enable straightforward computation of principal components in a differentially private manner.<n>Our method consistently outperforms existing approaches in terms of statistical utility.
arXiv Detail & Related papers (2025-07-21T04:27:09Z)
Fiducial Matching: Differentially Private Inference for Categorical Data [0.0]
Inferential statistical inference is still an open area of investigation in a differentially private (DP) setting.<n>We propose a simulation-based matching approach, solved through tools from the fiducial framework.<n>We focus on the analysis of categorical (nominal) data that is common in national surveys.
arXiv Detail & Related papers (2025-07-15T21:56:15Z)
Assumption-Lean Post-Integrated Inference with Negative Control Outcomes [0.0]
We introduce a robust post-integrated inference (PII) method that adjusts for latent heterogeneity using negative control outcomes. Our method extends to projected direct effect estimands, accounting for hidden mediators, confounders, and moderators. The proposed doubly robust estimators are consistent and efficient under minimal assumptions and potential misspecification.
arXiv Detail & Related papers (2024-10-07T12:52:38Z)
Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation [62.2436697657307]
Prediction-powered inference (PPI) is a method that improves statistical estimates based on limited human-labeled data. We propose a method called Stratified Prediction-Powered Inference (StratPPI) We show that the basic PPI estimates can be considerably improved by employing simple data stratification strategies.
arXiv Detail & Related papers (2024-06-06T17:37:39Z)
Variational DAG Estimation via State Augmentation With Stochastic Permutations [16.57658783816741]
Estimating the structure of a Bayesian network from observational data is a statistically and computationally hard problem. From a probabilistic inference perspective, the main challenges are (i) representing distributions over graphs that satisfy the DAG constraint and (ii) estimating a posterior over the underlying space. We propose an approach that addresses these challenges by formulating a joint distribution on an augmented space of DAGs and permutations.
arXiv Detail & Related papers (2024-02-04T23:51:04Z)
Efficient Conformal Prediction under Data Heterogeneity [79.35418041861327]
Conformal Prediction (CP) stands out as a robust framework for uncertainty quantification. Existing approaches for tackling non-exchangeability lead to methods that are not computable beyond the simplest examples. This work introduces a new efficient approach to CP that produces provably valid confidence sets for fairly general non-exchangeable data distributions.
arXiv Detail & Related papers (2023-12-25T20:02:51Z)
On the Estimation Performance of Generalized Power Method for Heteroscedastic Probabilistic PCA [21.9585534723895]
We show that, given a suitable iterate between the GPMs generated by at least geometrically bound to some threshold, the GPMs decrease to some threshold with the residual part of certain "-resi decomposition" In this paper, we demonstrate the superior performance with sub-Gaussian noise settings using the PCA technique.
arXiv Detail & Related papers (2023-12-06T11:41:17Z)
Learning Linear Causal Representations from Interventions under General Nonlinear Mixing [52.66151568785088]
We prove strong identifiability results given unknown single-node interventions without access to the intervention targets. This is the first instance of causal identifiability from non-paired interventions for deep neural network embeddings.
arXiv Detail & Related papers (2023-06-04T02:32:12Z)
Optimizing the Noise in Self-Supervised Learning: from Importance Sampling to Noise-Contrastive Estimation [80.07065346699005]
It is widely assumed that the optimal noise distribution should be made equal to the data distribution, as in Generative Adversarial Networks (GANs) We turn to Noise-Contrastive Estimation which grounds this self-supervised task as an estimation problem of an energy-based model of the data. We soberly conclude that the optimal noise may be hard to sample from, and the gain in efficiency can be modest compared to choosing the noise distribution equal to the data's.
arXiv Detail & Related papers (2023-01-23T19:57:58Z)
Non-Linear Spectral Dimensionality Reduction Under Uncertainty [107.01839211235583]
We propose a new dimensionality reduction framework, called NGEU, which leverages uncertainty information and directly extends several traditional approaches. We show that the proposed NGEU formulation exhibits a global closed-form solution, and we analyze, based on the Rademacher complexity, how the underlying uncertainties theoretically affect the generalization ability of the framework.
arXiv Detail & Related papers (2022-02-09T19:01:33Z)
On the Role of Entropy-based Loss for Learning Causal Structures with Continuous Optimization [27.613220411996025]
A method with non-combinatorial directed acyclic constraint, called NOTEARS, formulates the causal structure learning problem as a continuous optimization problem using least-square loss. We show that the violation of the Gaussian noise assumption will hinder the causal direction identification. We propose a more general entropy-based loss that is theoretically consistent with the likelihood score under any noise distribution.
arXiv Detail & Related papers (2021-06-05T08:29:51Z)
Robust Bayesian Inference for Discrete Outcomes with the Total Variation Distance [5.139874302398955]
Models of discrete-valued outcomes are easily misspecified if the data exhibit zero-inflation, overdispersion or contamination. Here, we introduce a robust discrepancy-based Bayesian approach using the Total Variation Distance (TVD) We empirically demonstrate that our approach is robust and significantly improves predictive performance on a range of simulated and real world data.
arXiv Detail & Related papers (2020-10-26T09:53:06Z)
Learning while Respecting Privacy and Robustness to Distributional Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model. The objective is to endow the trained model with robustness against adversarially manipulated input data. Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.