Related papers: Batches Stabilize the Minimum Norm Risk in High Dimensional Overparameterized Linear Regression

Batches Stabilize the Minimum Norm Risk in High Dimensional Overparameterized Linear Regression

URL: http://arxiv.org/abs/2306.08432v2
Date: Mon, 7 Aug 2023 10:58:21 GMT
Title: Batches Stabilize the Minimum Norm Risk in High Dimensional Overparameterized Linear Regression
Authors: Shahar Stein Ioushua, Inbar Hasidim, Ofer Shayevitz and Meir Feder
Abstract summary: We show that batch partitioning offers useful trade-offs between computational efficiency and performance. We suggest a natural small-batch version of the minimum-norm estimator, and derive an upper bound on its quadratic risk. Our bound is derived via a novel combination of techniques, in particular normal approximation in the Wasserstein metric of noisy projections over random subspaces.
Score: 21.83136833217205
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Learning algorithms that divide the data into batches are prevalent in many machine-learning applications, typically offering useful trade-offs between computational efficiency and performance. In this paper, we examine the benefits of batch-partitioning through the lens of a minimum-norm overparameterized linear regression model with isotropic Gaussian features. We suggest a natural small-batch version of the minimum-norm estimator, and derive an upper bound on its quadratic risk, showing it is inversely proportional to the noise level as well as to the overparameterization ratio, for the optimal choice of batch size. In contrast to minimum-norm, our estimator admits a stable risk behavior that is monotonically increasing in the overparameterization ratio, eliminating both the blowup at the interpolation point and the double-descent phenomenon. Interestingly, we observe that this implicit regularization offered by the batch partition is partially explained by feature overlap between the batches. Our bound is derived via a novel combination of techniques, in particular normal approximation in the Wasserstein metric of noisy projections over random subspaces.

Related papers

Multivariate root-n-consistent smoothing parameter free matching estimators and estimators of inverse density weighted expectations [51.000851088730684]
We develop novel modifications of nearest-neighbor and matching estimators which converge at the parametric $sqrt n $-rate. We stress that our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent parameters smoothing.
arXiv Detail & Related papers (2024-07-11T13:28:34Z)
Minimax Linear Regression under the Quantile Risk [31.277788690403522]
We study the problem of designing minimax procedures in linear regression under the quantile risk. We prove a matching upper bound on the worst-case quantile risk of a variant of the recently proposed min-max regression procedure.
arXiv Detail & Related papers (2024-06-17T23:24:14Z)
Ensemble linear interpolators: The role of ensembling [5.135730286836428]
Interpolators are unstable, for example, the mininum $ell$ norm least square interpolator exhibits test errors when dealing with noisy data. We study how ensemble stabilizes and thus improves the unbounded performance, measured by the out-of-sample prediction risk, of an individual interpolator.
arXiv Detail & Related papers (2023-09-06T20:38:04Z)
Kernel-based off-policy estimation without overlap: Instance optimality beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data. For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z)
Compound Batch Normalization for Long-tailed Image Classification [77.42829178064807]
We propose a compound batch normalization method based on a Gaussian mixture. It can model the feature space more comprehensively and reduce the dominance of head classes. The proposed method outperforms existing methods on long-tailed image classification.
arXiv Detail & Related papers (2022-12-02T07:31:39Z)
Optimally tackling covariate shift in RKHS-based nonparametric regression [43.457497490211985]
We show that a kernel ridge regression estimator with a carefully chosen regularization parameter is minimax rate-optimal. We also show that a naive estimator, which minimizes the empirical risk over the function class, is strictly sub-optimal. We propose a reweighted KRR estimator that weights samples based on a careful truncation of the likelihood ratios.
arXiv Detail & Related papers (2022-05-06T02:33:24Z)
Non asymptotic estimation lower bounds for LTI state space models with Cram\'er-Rao and van Trees [1.14219428942199]
We study the estimation problem for linear time-invariant (LTI) state-space models with Gaussian excitation of an unknown covariance. We provide non lower bounds for the expected estimation error and the mean square estimation risk of the least square estimator. Our results extend and improve existing lower bounds to lower bounds in expectation of the mean square estimation risk.
arXiv Detail & Related papers (2021-09-17T15:00:25Z)
Rao-Blackwellizing the Straight-Through Gumbel-Softmax Gradient Estimator [93.05919133288161]
We show that the variance of the straight-through variant of the popular Gumbel-Softmax estimator can be reduced through Rao-Blackwellization. This provably reduces the mean squared error. We empirically demonstrate that this leads to variance reduction, faster convergence, and generally improved performance in two unsupervised latent variable models.
arXiv Detail & Related papers (2020-10-09T22:54:38Z)
Fundamental Limits of Ridge-Regularized Empirical Risk Minimization in High Dimensions [41.7567932118769]
Empirical Risk Minimization algorithms are widely used in a variety of estimation and prediction tasks. In this paper, we characterize for the first time the fundamental limits on the statistical accuracy of convex ERM for inference.
arXiv Detail & Related papers (2020-06-16T04:27:38Z)
SUMO: Unbiased Estimation of Log Marginal Probability for Latent Variable Models [80.22609163316459]
We introduce an unbiased estimator of the log marginal likelihood and its gradients for latent variable models based on randomized truncation of infinite series. We show that models trained using our estimator give better test-set likelihoods than a standard importance-sampling based approach for the same average computational cost.
arXiv Detail & Related papers (2020-04-01T11:49:30Z)
Support recovery and sup-norm convergence rates for sparse pivotal estimation [79.13844065776928]
In high dimensional sparse regression, pivotal estimators are estimators for which the optimal regularization parameter is independent of the noise level. We show minimax sup-norm convergence rates for non smoothed and smoothed, single task and multitask square-root Lasso-type estimators.
arXiv Detail & Related papers (2020-01-15T16:11:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.