Batches Stabilize the Minimum Norm Risk in High Dimensional Overparameterized Linear Regression
- URL: http://arxiv.org/abs/2306.08432v3
- Date: Sat, 21 Sep 2024 19:39:26 GMT
- Title: Batches Stabilize the Minimum Norm Risk in High Dimensional Overparameterized Linear Regression
- Authors: Shahar Stein Ioushua, Inbar Hasidim, Ofer Shayevitz, Meir Feder,
- Abstract summary: We show the benefits of batch- partitioning through the lens of a minimum-norm overparametrized linear regression model.
We characterize the optimal batch size and show it is inversely proportional to the noise level.
We also show that shrinking the batch minimum-norm estimator by a factor equal to the Weiner coefficient further stabilizes it and results in lower quadratic risk in all settings.
- Score: 12.443289202402761
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning algorithms that divide the data into batches are prevalent in many machine-learning applications, typically offering useful trade-offs between computational efficiency and performance. In this paper, we examine the benefits of batch-partitioning through the lens of a minimum-norm overparametrized linear regression model with isotropic Gaussian features. We suggest a natural small-batch version of the minimum-norm estimator and derive bounds on its quadratic risk. We then characterize the optimal batch size and show it is inversely proportional to the noise level, as well as to the overparametrization ratio. In contrast to minimum-norm, our estimator admits a stable risk behavior that is monotonically increasing in the overparametrization ratio, eliminating both the blowup at the interpolation point and the double-descent phenomenon. We further show that shrinking the batch minimum-norm estimator by a factor equal to the Weiner coefficient further stabilizes it and results in lower quadratic risk in all settings. Interestingly, we observe that the implicit regularization offered by the batch partition is partially explained by feature overlap between the batches. Our bound is derived via a novel combination of techniques, in particular normal approximation in the Wasserstein metric of noisy projections over random subspaces.
Related papers
- Multivariate root-n-consistent smoothing parameter free matching estimators and estimators of inverse density weighted expectations [51.000851088730684]
We develop novel modifications of nearest-neighbor and matching estimators which converge at the parametric $sqrt n $-rate.
We stress that our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent parameters smoothing.
arXiv Detail & Related papers (2024-07-11T13:28:34Z) - Minimax Linear Regression under the Quantile Risk [31.277788690403522]
We study the problem of designing minimax procedures in linear regression under the quantile risk.
We prove a matching upper bound on the worst-case quantile risk of a variant of the recently proposed min-max regression procedure.
arXiv Detail & Related papers (2024-06-17T23:24:14Z) - Ensemble linear interpolators: The role of ensembling [5.135730286836428]
Interpolators are unstable, for example, the mininum $ell$ norm least square interpolator exhibits test errors when dealing with noisy data.
We study how ensemble stabilizes and thus improves the unbounded performance, measured by the out-of-sample prediction risk, of an individual interpolator.
arXiv Detail & Related papers (2023-09-06T20:38:04Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - Compound Batch Normalization for Long-tailed Image Classification [77.42829178064807]
We propose a compound batch normalization method based on a Gaussian mixture.
It can model the feature space more comprehensively and reduce the dominance of head classes.
The proposed method outperforms existing methods on long-tailed image classification.
arXiv Detail & Related papers (2022-12-02T07:31:39Z) - Optimally tackling covariate shift in RKHS-based nonparametric
regression [43.457497490211985]
We show that a kernel ridge regression estimator with a carefully chosen regularization parameter is minimax rate-optimal.
We also show that a naive estimator, which minimizes the empirical risk over the function class, is strictly sub-optimal.
We propose a reweighted KRR estimator that weights samples based on a careful truncation of the likelihood ratios.
arXiv Detail & Related papers (2022-05-06T02:33:24Z) - Non asymptotic estimation lower bounds for LTI state space models with
Cram\'er-Rao and van Trees [1.14219428942199]
We study the estimation problem for linear time-invariant (LTI) state-space models with Gaussian excitation of an unknown covariance.
We provide non lower bounds for the expected estimation error and the mean square estimation risk of the least square estimator.
Our results extend and improve existing lower bounds to lower bounds in expectation of the mean square estimation risk.
arXiv Detail & Related papers (2021-09-17T15:00:25Z) - Rao-Blackwellizing the Straight-Through Gumbel-Softmax Gradient
Estimator [93.05919133288161]
We show that the variance of the straight-through variant of the popular Gumbel-Softmax estimator can be reduced through Rao-Blackwellization.
This provably reduces the mean squared error.
We empirically demonstrate that this leads to variance reduction, faster convergence, and generally improved performance in two unsupervised latent variable models.
arXiv Detail & Related papers (2020-10-09T22:54:38Z) - Fundamental Limits of Ridge-Regularized Empirical Risk Minimization in
High Dimensions [41.7567932118769]
Empirical Risk Minimization algorithms are widely used in a variety of estimation and prediction tasks.
In this paper, we characterize for the first time the fundamental limits on the statistical accuracy of convex ERM for inference.
arXiv Detail & Related papers (2020-06-16T04:27:38Z) - SUMO: Unbiased Estimation of Log Marginal Probability for Latent
Variable Models [80.22609163316459]
We introduce an unbiased estimator of the log marginal likelihood and its gradients for latent variable models based on randomized truncation of infinite series.
We show that models trained using our estimator give better test-set likelihoods than a standard importance-sampling based approach for the same average computational cost.
arXiv Detail & Related papers (2020-04-01T11:49:30Z) - Support recovery and sup-norm convergence rates for sparse pivotal
estimation [79.13844065776928]
In high dimensional sparse regression, pivotal estimators are estimators for which the optimal regularization parameter is independent of the noise level.
We show minimax sup-norm convergence rates for non smoothed and smoothed, single task and multitask square-root Lasso-type estimators.
arXiv Detail & Related papers (2020-01-15T16:11:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.