On Model Identification and Out-of-Sample Prediction of Principal
Component Regression: Applications to Synthetic Controls
- URL: http://arxiv.org/abs/2010.14449v5
- Date: Fri, 25 Aug 2023 17:33:22 GMT
- Title: On Model Identification and Out-of-Sample Prediction of Principal
Component Regression: Applications to Synthetic Controls
- Authors: Anish Agarwal, Devavrat Shah, Dennis Shen
- Abstract summary: We analyze principal component regression (PCR) in a high-dimensional error-in-variables setting with fixed design.
We establish non-asymptotic out-of-sample prediction guarantees that improve upon the best known rates.
- Score: 20.96904429337912
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We analyze principal component regression (PCR) in a high-dimensional
error-in-variables setting with fixed design. Under suitable conditions, we
show that PCR consistently identifies the unique model with minimum
$\ell_2$-norm. These results enable us to establish non-asymptotic
out-of-sample prediction guarantees that improve upon the best known rates. In
the course of our analysis, we introduce a natural linear algebraic condition
between the in- and out-of-sample covariates, which allows us to avoid
distributional assumptions for out-of-sample predictions. Our simulations
illustrate the importance of this condition for generalization, even under
covariate shifts. Accordingly, we construct a hypothesis test to check when
this conditions holds in practice. As a byproduct, our results also lead to
novel results for the synthetic controls literature, a leading approach for
policy evaluation. To the best of our knowledge, our prediction guarantees for
the fixed design setting have been elusive in both the high-dimensional
error-in-variables and synthetic controls literatures.
Related papers
- Semi-supervised Regression Analysis with Model Misspecification and High-dimensional Data [8.619243141968886]
We present an inference framework for estimating regression coefficients in conditional mean models.
We develop an augmented inverse probability weighted (AIPW) method, employing regularized estimators for both propensity score (PS) and outcome regression (OR) models.
Our theoretical findings are verified through extensive simulation studies and a real-world data application.
arXiv Detail & Related papers (2024-06-20T00:34:54Z) - Prognostic Covariate Adjustment for Logistic Regression in Randomized
Controlled Trials [1.5020330976600735]
We show that prognostic score adjustment can increase the power of the Wald test for the conditional odds ratio under a fixed sample size.
We utilize g-computation to expand the scope of prognostic score adjustment to inferences on the marginal risk difference, relative risk, and odds ratio estimands.
arXiv Detail & Related papers (2024-02-29T06:53:16Z) - High Precision Causal Model Evaluation with Conditional Randomization [10.23470075454725]
We introduce a novel low-variance estimator for causal error, dubbed as the pairs estimator.
By applying the same IPW estimator to both the model and true experimental effects, our estimator effectively cancels out the variance due to IPW and achieves a smaller variance.
Our method offers a simple yet powerful solution to evaluate causal inference models in conditional randomization settings without complicated modification of the IPW estimator itself.
arXiv Detail & Related papers (2023-11-03T13:22:27Z) - Selective Nonparametric Regression via Testing [54.20569354303575]
We develop an abstention procedure via testing the hypothesis on the value of the conditional variance at a given point.
Unlike existing methods, the proposed one allows to account not only for the value of the variance itself but also for the uncertainty of the corresponding variance predictor.
arXiv Detail & Related papers (2023-09-28T13:04:11Z) - Conformal Language Modeling [61.94417935386489]
We propose a novel approach to conformal prediction for generative language models (LMs)
Standard conformal prediction produces prediction sets with rigorous, statistical guarantees.
We demonstrate the promise of our approach on multiple tasks in open-domain question answering, text summarization, and radiology report generation.
arXiv Detail & Related papers (2023-06-16T21:55:08Z) - Conformalized Unconditional Quantile Regression [27.528258690139793]
We develop a predictive inference procedure that combines conformal prediction with unconditional quantile regression.
We show that our procedure is adaptive to heteroscedasticity, provides transparent coverage guarantees that are relevant to the test instance at hand, and performs competitively with existing methods in terms of efficiency.
arXiv Detail & Related papers (2023-04-04T00:20:26Z) - Probabilistic Conformal Prediction Using Conditional Random Samples [73.26753677005331]
PCP is a predictive inference algorithm that estimates a target variable by a discontinuous predictive set.
It is efficient and compatible with either explicit or implicit conditional generative models.
arXiv Detail & Related papers (2022-06-14T03:58:03Z) - Mitigating multiple descents: A model-agnostic framework for risk
monotonization [84.6382406922369]
We develop a general framework for risk monotonization based on cross-validation.
We propose two data-driven methodologies, namely zero- and one-step, that are akin to bagging and boosting.
arXiv Detail & Related papers (2022-05-25T17:41:40Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - Balance-Subsampled Stable Prediction [55.13512328954456]
We propose a novel balance-subsampled stable prediction (BSSP) algorithm based on the theory of fractional factorial design.
A design-theoretic analysis shows that the proposed method can reduce the confounding effects among predictors induced by the distribution shift.
Numerical experiments on both synthetic and real-world data sets demonstrate that our BSSP algorithm significantly outperforms the baseline methods for stable prediction across unknown test data.
arXiv Detail & Related papers (2020-06-08T07:01:38Z) - Imputation for High-Dimensional Linear Regression [8.841513006680886]
We show that LASSO retains the minimax estimation rate in the random setting.
We show that the co-root remains mphmph in this setting.
arXiv Detail & Related papers (2020-01-24T19:54:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.