Related papers: On Model Identification and Out-of-Sample Prediction of Principal Component Regression: Applications to Synthetic Controls

On Model Identification and Out-of-Sample Prediction of Principal Component Regression: Applications to Synthetic Controls

URL: http://arxiv.org/abs/2010.14449v5
Date: Fri, 25 Aug 2023 17:33:22 GMT
Title: On Model Identification and Out-of-Sample Prediction of Principal Component Regression: Applications to Synthetic Controls
Authors: Anish Agarwal, Devavrat Shah, Dennis Shen
Abstract summary: We analyze principal component regression (PCR) in a high-dimensional error-in-variables setting with fixed design. We establish non-asymptotic out-of-sample prediction guarantees that improve upon the best known rates.
Score: 20.96904429337912
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We analyze principal component regression (PCR) in a high-dimensional error-in-variables setting with fixed design. Under suitable conditions, we show that PCR consistently identifies the unique model with minimum $\ell_2$-norm. These results enable us to establish non-asymptotic out-of-sample prediction guarantees that improve upon the best known rates. In the course of our analysis, we introduce a natural linear algebraic condition between the in- and out-of-sample covariates, which allows us to avoid distributional assumptions for out-of-sample predictions. Our simulations illustrate the importance of this condition for generalization, even under covariate shifts. Accordingly, we construct a hypothesis test to check when this conditions holds in practice. As a byproduct, our results also lead to novel results for the synthetic controls literature, a leading approach for policy evaluation. To the best of our knowledge, our prediction guarantees for the fixed design setting have been elusive in both the high-dimensional error-in-variables and synthetic controls literatures.

Related papers

Causal Lifting of Neural Representations: Zero-Shot Generalization for Causal Inferences [56.23412698865433]
We focus on Prediction-Powered Causal Inferences (PPCI)<n> PPCI estimates the treatment effect in a target experiment with unlabeled factual outcomes, retrievable zero-shot from a pre-trained model.<n>We validate our method on synthetic and real-world scientific data, offering solutions to instances not solvable by vanilla Empirical Risk Minimization.
arXiv Detail & Related papers (2025-02-10T10:52:17Z)
Conformal Generative Modeling with Improved Sample Efficiency through Sequential Greedy Filtering [55.15192437680943]
Generative models lack rigorous statistical guarantees for their outputs. We propose a sequential conformal prediction method producing prediction sets that satisfy a rigorous statistical guarantee. This guarantee states that with high probability, the prediction sets contain at least one admissible (or valid) example.
arXiv Detail & Related papers (2024-10-02T15:26:52Z)
Risk and cross validation in ridge regression with correlated samples [72.59731158970894]
We provide training examples for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations. We further extend our analysis to the case where the test point has non-trivial correlations with the training set, setting often encountered in time series forecasting. We validate our theory across a variety of high dimensional data.
arXiv Detail & Related papers (2024-08-08T17:27:29Z)
Semi-supervised Regression Analysis with Model Misspecification and High-dimensional Data [8.619243141968886]
We present an inference framework for estimating regression coefficients in conditional mean models. We develop an augmented inverse probability weighted (AIPW) method, employing regularized estimators for both propensity score (PS) and outcome regression (OR) models. Our theoretical findings are verified through extensive simulation studies and a real-world data application.
arXiv Detail & Related papers (2024-06-20T00:34:54Z)
ROTI-GCV: Generalized Cross-Validation for right-ROTationally Invariant Data [1.194799054956877]
Two key tasks in high-dimensional regularized regression are tuning the regularization strength for accurate predictions and estimating the out-of-sample risk. We introduce a new framework, ROTI-GCV, for reliably performing cross-validation under challenging conditions.
arXiv Detail & Related papers (2024-06-17T15:50:00Z)
Prognostic Covariate Adjustment for Logistic Regression in Randomized Controlled Trials [1.5020330976600735]
We show that prognostic score adjustment can increase the power of the Wald test for the conditional odds ratio under a fixed sample size. We utilize g-computation to expand the scope of prognostic score adjustment to inferences on the marginal risk difference, relative risk, and odds ratio estimands.
arXiv Detail & Related papers (2024-02-29T06:53:16Z)
Selective Nonparametric Regression via Testing [54.20569354303575]
We develop an abstention procedure via testing the hypothesis on the value of the conditional variance at a given point. Unlike existing methods, the proposed one allows to account not only for the value of the variance itself but also for the uncertainty of the corresponding variance predictor.
arXiv Detail & Related papers (2023-09-28T13:04:11Z)
Adaptive Principal Component Regression with Applications to Panel Data [29.295938927701396]
We provide the first time-uniform finite sample guarantees for (regularized) Principal component regression. Our results rely on adapting tools from modern martingale concentration to the error-in-variables setting. We show that our method empirically outperforms a baseline which does not leverage error-in-variables regression.
arXiv Detail & Related papers (2023-07-03T21:13:40Z)
Mitigating multiple descents: A model-agnostic framework for risk monotonization [84.6382406922369]
We develop a general framework for risk monotonization based on cross-validation. We propose two data-driven methodologies, namely zero- and one-step, that are akin to bagging and boosting.
arXiv Detail & Related papers (2022-05-25T17:41:40Z)
On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD) We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting. We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z)
Balance-Subsampled Stable Prediction [55.13512328954456]
We propose a novel balance-subsampled stable prediction (BSSP) algorithm based on the theory of fractional factorial design. A design-theoretic analysis shows that the proposed method can reduce the confounding effects among predictors induced by the distribution shift. Numerical experiments on both synthetic and real-world data sets demonstrate that our BSSP algorithm significantly outperforms the baseline methods for stable prediction across unknown test data.
arXiv Detail & Related papers (2020-06-08T07:01:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.