Batches Stabilize the Minimum Norm Risk in High Dimensional
Overparameterized Linear Regression
- URL: http://arxiv.org/abs/2306.08432v2
- Date: Mon, 7 Aug 2023 10:58:21 GMT
- Title: Batches Stabilize the Minimum Norm Risk in High Dimensional
Overparameterized Linear Regression
- Authors: Shahar Stein Ioushua, Inbar Hasidim, Ofer Shayevitz and Meir Feder
- Abstract summary: We show that batch partitioning offers useful trade-offs between computational efficiency and performance.
We suggest a natural small-batch version of the minimum-norm estimator, and derive an upper bound on its quadratic risk.
Our bound is derived via a novel combination of techniques, in particular normal approximation in the Wasserstein metric of noisy projections over random subspaces.
- Score: 21.83136833217205
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning algorithms that divide the data into batches are prevalent in many
machine-learning applications, typically offering useful trade-offs between
computational efficiency and performance. In this paper, we examine the
benefits of batch-partitioning through the lens of a minimum-norm
overparameterized linear regression model with isotropic Gaussian features. We
suggest a natural small-batch version of the minimum-norm estimator, and derive
an upper bound on its quadratic risk, showing it is inversely proportional to
the noise level as well as to the overparameterization ratio, for the optimal
choice of batch size. In contrast to minimum-norm, our estimator admits a
stable risk behavior that is monotonically increasing in the
overparameterization ratio, eliminating both the blowup at the interpolation
point and the double-descent phenomenon. Interestingly, we observe that this
implicit regularization offered by the batch partition is partially explained
by feature overlap between the batches. Our bound is derived via a novel
combination of techniques, in particular normal approximation in the
Wasserstein metric of noisy projections over random subspaces.
Related papers
- Multivariate root-n-consistent smoothing parameter free matching estimators and estimators of inverse density weighted expectations [51.000851088730684]
We develop novel modifications of nearest-neighbor and matching estimators which converge at the parametric $sqrt n $-rate.
We stress that our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent parameters smoothing.
arXiv Detail & Related papers (2024-07-11T13:28:34Z) - Distributionally Robust Optimization with Bias and Variance Reduction [9.341215359733601]
We show that Prospect, a gradient-based algorithm, enjoys linear convergence for smooth regularized losses.
We also show that Prospect can converge 2-3$times$ faster than baselines such as gradient-based methods.
arXiv Detail & Related papers (2023-10-21T00:03:54Z) - Ensemble linear interpolators: The role of ensembling [5.135730286836428]
Interpolators are unstable, for example, the mininum $ell$ norm least square interpolator exhibits test errors when dealing with noisy data.
We study how ensemble stabilizes and thus improves the unbounded performance, measured by the out-of-sample prediction risk, of an individual interpolator.
arXiv Detail & Related papers (2023-09-06T20:38:04Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - Compound Batch Normalization for Long-tailed Image Classification [77.42829178064807]
We propose a compound batch normalization method based on a Gaussian mixture.
It can model the feature space more comprehensively and reduce the dominance of head classes.
The proposed method outperforms existing methods on long-tailed image classification.
arXiv Detail & Related papers (2022-12-02T07:31:39Z) - Noise Estimation in Gaussian Process Regression [1.5002438468152661]
The presented method can be used to estimate the variance of the correlated error, and the variance of the noise based on maximizing a marginal likelihood function.
We demonstrate the computational advantages and robustness of the presented approach compared to traditional parameter optimization.
arXiv Detail & Related papers (2022-06-20T19:36:03Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Fundamental Limits of Ridge-Regularized Empirical Risk Minimization in
High Dimensions [41.7567932118769]
Empirical Risk Minimization algorithms are widely used in a variety of estimation and prediction tasks.
In this paper, we characterize for the first time the fundamental limits on the statistical accuracy of convex ERM for inference.
arXiv Detail & Related papers (2020-06-16T04:27:38Z) - Automated data-driven selection of the hyperparameters for
Total-Variation based texture segmentation [12.093824308505216]
Generalized Stein Unbiased Risk Estimator is revisited to handle correlated Gaussian noise.
Problem formulation naturally entails inter-scale and spatially correlated noise.
arXiv Detail & Related papers (2020-04-20T16:43:09Z) - SUMO: Unbiased Estimation of Log Marginal Probability for Latent
Variable Models [80.22609163316459]
We introduce an unbiased estimator of the log marginal likelihood and its gradients for latent variable models based on randomized truncation of infinite series.
We show that models trained using our estimator give better test-set likelihoods than a standard importance-sampling based approach for the same average computational cost.
arXiv Detail & Related papers (2020-04-01T11:49:30Z) - Support recovery and sup-norm convergence rates for sparse pivotal
estimation [79.13844065776928]
In high dimensional sparse regression, pivotal estimators are estimators for which the optimal regularization parameter is independent of the noise level.
We show minimax sup-norm convergence rates for non smoothed and smoothed, single task and multitask square-root Lasso-type estimators.
arXiv Detail & Related papers (2020-01-15T16:11:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.