On Optimal Interpolation In Linear Regression
- URL: http://arxiv.org/abs/2110.11258v1
- Date: Thu, 21 Oct 2021 16:37:10 GMT
- Title: On Optimal Interpolation In Linear Regression
- Authors: Eduard Oravkin, Patrick Rebeschini
- Abstract summary: We show that the optimal way to interpolate in linear regression is to use functions that are linear in the response variable.
We identify a regime where the minimum-norm interpolator provably generalizes arbitrarily worse than the optimal response-linear achievable interpolator.
We extend the notion of optimal response-linear to random features regression under a linear data-generating model.
- Score: 22.310861786709538
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Understanding when and why interpolating methods generalize well has recently
been a topic of interest in statistical learning theory. However,
systematically connecting interpolating methods to achievable notions of
optimality has only received partial attention. In this paper, we investigate
the question of what is the optimal way to interpolate in linear regression
using functions that are linear in the response variable (as the case for the
Bayes optimal estimator in ridge regression) and depend on the data, the
population covariance of the data, the signal-to-noise ratio and the covariance
of the prior for the signal, but do not depend on the value of the signal
itself nor the noise vector in the training data. We provide a closed-form
expression for the interpolator that achieves this notion of optimality and
show that it can be derived as the limit of preconditioned gradient descent
with a specific initialization. We identify a regime where the minimum-norm
interpolator provably generalizes arbitrarily worse than the optimal
response-linear achievable interpolator that we introduce, and validate with
numerical experiments that the notion of optimality we consider can be achieved
by interpolating methods that only use the training data as input in the case
of an isotropic prior. Finally, we extend the notion of optimal response-linear
interpolation to random features regression under a linear data-generating
model that has been previously studied in the literature.
Related papers
- Differentially Private Optimization with Sparse Gradients [60.853074897282625]
We study differentially private (DP) optimization problems under sparsity of individual gradients.
Building on this, we obtain pure- and approximate-DP algorithms with almost optimal rates for convex optimization with sparse gradients.
arXiv Detail & Related papers (2024-04-16T20:01:10Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Adaptive Linear Estimating Equations [5.985204759362746]
In this paper, we propose a general method for constructing debiased estimator.
It makes use of the idea of adaptive linear estimating equations, and we establish theoretical guarantees of normality.
A salient feature of our estimator is that in the context of multi-armed bandits, our estimator retains the non-asymptotic performance.
arXiv Detail & Related papers (2023-07-14T12:55:47Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - Shuffled linear regression through graduated convex relaxation [12.614901374282868]
The shuffled linear regression problem aims to recover linear relationships in datasets where the correspondence between input and output is unknown.
This problem arises in a wide range of applications including survey data.
We propose a novel optimization algorithm for shuffled linear regression based on a posterior-maximizing objective function.
arXiv Detail & Related papers (2022-09-30T17:33:48Z) - Sparse high-dimensional linear regression with a partitioned empirical
Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression.
Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates.
The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z) - Near-optimal inference in adaptive linear regression [60.08422051718195]
Even simple methods like least squares can exhibit non-normal behavior when data is collected in an adaptive manner.
We propose a family of online debiasing estimators to correct these distributional anomalies in at least squares estimation.
We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.
arXiv Detail & Related papers (2021-07-05T21:05:11Z) - Piecewise linear regression and classification [0.20305676256390928]
This paper proposes a method for solving multivariate regression and classification problems using piecewise linear predictors.
A Python implementation of the algorithm described in this paper is available at http://cse.lab.imtlucca.it/bemporad/parc.
arXiv Detail & Related papers (2021-03-10T17:07:57Z) - Random extrapolation for primal-dual coordinate descent [61.55967255151027]
We introduce a randomly extrapolated primal-dual coordinate descent method that adapts to sparsity of the data matrix and the favorable structures of the objective function.
We show almost sure convergence of the sequence and optimal sublinear convergence rates for the primal-dual gap and objective values, in the general convex-concave case.
arXiv Detail & Related papers (2020-07-13T17:39:35Z) - To Each Optimizer a Norm, To Each Norm its Generalization [31.682969645989512]
We study the implicit regularization of optimization methods for linear models interpolating the training data in the under-parametrized and over-parametrized regimes.
We argue that analyzing convergence to the standard maximum l2-margin is arbitrary and show that minimizing the norm induced by the data results in better generalizations.
arXiv Detail & Related papers (2020-06-11T21:07:38Z) - Linear predictor on linearly-generated data with missing values: non
consistency and solutions [0.0]
We study the seemingly-simple case where the target to predict is a linear function of the fully-observed data.
We show that, in the presence of missing values, the optimal predictor may not be linear.
arXiv Detail & Related papers (2020-02-03T11:49:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.