Prediction in latent factor regression: Adaptive PCR and beyond
- URL: http://arxiv.org/abs/2007.10050v2
- Date: Fri, 23 Apr 2021 16:34:47 GMT
- Title: Prediction in latent factor regression: Adaptive PCR and beyond
- Authors: Xin Bing, Florentina Bunea, Seth Strimas-Mackey, Marten Wegkamp
- Abstract summary: We prove a master theorem that establishes a risk bound for a large class of predictors.
We use our main theorem to recover known risk bounds for the minimum-norm interpolating predictor.
We conclude with a detailed simulation study to support and complement our theoretical results.
- Score: 2.9439848714137447
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work is devoted to the finite sample prediction risk analysis of a class
of linear predictors of a response $Y\in \mathbb{R}$ from a high-dimensional
random vector $X\in \mathbb{R}^p$ when $(X,Y)$ follows a latent factor
regression model generated by a unobservable latent vector $Z$ of dimension
less than $p$. Our primary contribution is in establishing finite sample risk
bounds for prediction with the ubiquitous Principal Component Regression (PCR)
method, under the factor regression model, with the number of principal
components adaptively selected from the data -- a form of theoretical guarantee
that is surprisingly lacking from the PCR literature. To accomplish this, we
prove a master theorem that establishes a risk bound for a large class of
predictors, including the PCR predictor as a special case. This approach has
the benefit of providing a unified framework for the analysis of a wide range
of linear prediction methods, under the factor regression setting. In
particular, we use our main theorem to recover known risk bounds for the
minimum-norm interpolating predictor, which has received renewed attention in
the past two years, and a prediction method tailored to a subclass of factor
regression models with identifiable parameters. This model-tailored method can
be interpreted as prediction via clusters with latent centers.
To address the problem of selecting among a set of candidate predictors, we
analyze a simple model selection procedure based on data-splitting, providing
an oracle inequality under the factor model to prove that the performance of
the selected predictor is close to the optimal candidate. We conclude with a
detailed simulation study to support and complement our theoretical results.
Related papers
- Progression: an extrapolation principle for regression [0.0]
We propose a novel statistical extrapolation principle.
It assumes a simple relationship between predictors and the response at the boundary of the training predictor samples.
Our semi-parametric method, progression, leverages this extrapolation principle and offers guarantees on the approximation error beyond the training data range.
arXiv Detail & Related papers (2024-10-30T17:29:51Z) - Conformal Generative Modeling with Improved Sample Efficiency through Sequential Greedy Filtering [55.15192437680943]
Generative models lack rigorous statistical guarantees for their outputs.
We propose a sequential conformal prediction method producing prediction sets that satisfy a rigorous statistical guarantee.
This guarantee states that with high probability, the prediction sets contain at least one admissible (or valid) example.
arXiv Detail & Related papers (2024-10-02T15:26:52Z) - Risk and cross validation in ridge regression with correlated samples [72.59731158970894]
We provide training examples for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations.
We further extend our analysis to the case where the test point has non-trivial correlations with the training set, setting often encountered in time series forecasting.
We validate our theory across a variety of high dimensional data.
arXiv Detail & Related papers (2024-08-08T17:27:29Z) - Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - Meta-Learning with Generalized Ridge Regression: High-dimensional Asymptotics, Optimality and Hyper-covariance Estimation [14.194212772887699]
We consider meta-learning within the framework of high-dimensional random-effects linear models.
We show the precise behavior of the predictive risk for a new test task when the data dimension grows proportionally to the number of samples per task.
We propose and analyze an estimator inverse random regression coefficients based on data from the training tasks.
arXiv Detail & Related papers (2024-03-27T21:18:43Z) - Conformalized Selective Regression [2.3964255330849356]
We propose a novel approach to selective regression by leveraging conformal prediction.
We show how our proposed approach, conformalized selective regression, demonstrates an advantage over multiple state-of-the-art baselines.
arXiv Detail & Related papers (2024-02-26T04:43:50Z) - Conformal inference for regression on Riemannian Manifolds [49.7719149179179]
We investigate prediction sets for regression scenarios when the response variable, denoted by $Y$, resides in a manifold, and the covariable, denoted by X, lies in Euclidean space.
We prove the almost sure convergence of the empirical version of these regions on the manifold to their population counterparts.
arXiv Detail & Related papers (2023-10-12T10:56:25Z) - Structured Radial Basis Function Network: Modelling Diversity for
Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions.
A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems.
It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z) - Transfer Learning with Random Coefficient Ridge Regression [2.0813318162800707]
Ridge regression with random coefficients provides an important alternative to fixed coefficients regression in high dimensional setting.
This paper considers estimation and prediction of random coefficient ridge regression in the setting of transfer learning.
arXiv Detail & Related papers (2023-06-28T04:36:37Z) - SLOE: A Faster Method for Statistical Inference in High-Dimensional
Logistic Regression [68.66245730450915]
We develop an improved method for debiasing predictions and estimating frequentist uncertainty for practical datasets.
Our main contribution is SLOE, an estimator of the signal strength with convergence guarantees that reduces the computation time of estimation and inference by orders of magnitude.
arXiv Detail & Related papers (2021-03-23T17:48:56Z) - Interpolating Predictors in High-Dimensional Factor Regression [2.1055643409860743]
This work studies finite-sample properties of the risk of the minimum-norm interpolating predictor in high-dimensional regression models.
We show that the min-norm interpolating predictor can have similar risk to predictors based on principal components regression and ridge regression, and can improve over LASSO based predictors, in the high-dimensional regime.
arXiv Detail & Related papers (2020-02-06T22:08:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.