Foolish Crowds Support Benign Overfitting
- URL: http://arxiv.org/abs/2110.02914v1
- Date: Wed, 6 Oct 2021 16:56:37 GMT
- Title: Foolish Crowds Support Benign Overfitting
- Authors: Niladri S. Chatterji and Philip M. Long
- Abstract summary: We prove a lower bound on the excess risk of sparse interpolating procedures for linear regression with Gaussian data.
Our analysis exposes the benefit of the "wisdom of the crowd", except here the harm arising from fitting the noise is ameliorated by spreading it among many directions.
- Score: 20.102619493827024
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We prove a lower bound on the excess risk of sparse interpolating procedures
for linear regression with Gaussian data in the overparameterized regime. We
work in a setting where the covariance structure has previously been shown to
be compatible with benign overfitting with fast convergence to the Bayes risk.
We apply the general bound to obtain a lower bound for basis pursuit (the
minimum $\ell_1$-norm interpolant) that implies that its excess risk can
converge at an exponentially slower rate than OLS (the minimum $\ell_2$-norm
interpolant), even when the ground truth is sparse. Our analysis exposes the
benefit of an effect analogous to the "wisdom of the crowd", except here the
harm arising from fitting the noise is ameliorated by spreading it among many
directions - the variance reduction arises from a foolish crowd.
Related papers
- Minimax Linear Regression under the Quantile Risk [31.277788690403522]
We study the problem of designing minimax procedures in linear regression under the quantile risk.
We prove a matching upper bound on the worst-case quantile risk of a variant of the recently proposed min-max regression procedure.
arXiv Detail & Related papers (2024-06-17T23:24:14Z) - Closing the Gap Between the Upper Bound and the Lower Bound of Adam's
Iteration Complexity [51.96093077151991]
We derive a new convergence guarantee of Adam, with only an $L$-smooth condition and a bounded noise variance assumption.
Our proof utilizes novel techniques to handle the entanglement between momentum and adaptive learning rate.
arXiv Detail & Related papers (2023-10-27T09:16:58Z) - Distributionally Robust Optimization with Bias and Variance Reduction [9.341215359733601]
We show that Prospect, a gradient-based algorithm, enjoys linear convergence for smooth regularized losses.
We also show that Prospect can converge 2-3$times$ faster than baselines such as gradient-based methods.
arXiv Detail & Related papers (2023-10-21T00:03:54Z) - Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits [53.281230333364505]
This paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM)
We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $tilde Obig(dsqrtsum_t=1Tsigma_t2 + dbig)$.
Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $tilde O(d)$ regret.
arXiv Detail & Related papers (2023-10-02T08:15:52Z) - Ensemble linear interpolators: The role of ensembling [5.135730286836428]
Interpolators are unstable, for example, the mininum $ell$ norm least square interpolator exhibits test errors when dealing with noisy data.
We study how ensemble stabilizes and thus improves the unbounded performance, measured by the out-of-sample prediction risk, of an individual interpolator.
arXiv Detail & Related papers (2023-09-06T20:38:04Z) - Batches Stabilize the Minimum Norm Risk in High Dimensional Overparameterized Linear Regression [12.443289202402761]
We show the benefits of batch- partitioning through the lens of a minimum-norm overparametrized linear regression model.
We characterize the optimal batch size and show it is inversely proportional to the noise level.
We also show that shrinking the batch minimum-norm estimator by a factor equal to the Weiner coefficient further stabilizes it and results in lower quadratic risk in all settings.
arXiv Detail & Related papers (2023-06-14T11:02:08Z) - Tighter PAC-Bayes Generalisation Bounds by Leveraging Example Difficulty [5.799808780731661]
We introduce a modified version of the excess risk.
It can be used to obtain tighter, fast-rate PAC-Bayesian generalisation bounds.
We empirically evaluate these new bounds on a number of real-world datasets.
arXiv Detail & Related papers (2022-10-20T14:14:52Z) - Benign Underfitting of Stochastic Gradient Descent [72.38051710389732]
We study to what extent may gradient descent (SGD) be understood as a "conventional" learning rule that achieves generalization performance by obtaining a good fit training data.
We analyze the closely related with-replacement SGD, for which an analogous phenomenon does not occur and prove that its population risk does in fact converge at the optimal rate.
arXiv Detail & Related papers (2022-02-27T13:25:01Z) - Interpolation can hurt robust generalization even when there is no noise [76.3492338989419]
We show that avoiding generalization through ridge regularization can significantly improve generalization even in the absence of noise.
We prove this phenomenon for the robust risk of both linear regression and classification and hence provide the first theoretical result on robust overfitting.
arXiv Detail & Related papers (2021-08-05T23:04:15Z) - Super fast rates in structured prediction [88.99819200562784]
We show how we can leverage the fact that discrete problems are essentially predicting a discrete output when continuous problems are predicting a continuous value.
We first illustrate it for predictors based on nearest neighbors, generalizing rates known for binary classification to any discrete problem within the framework of structured prediction.
We then consider kernel ridge regression where we improve known rates in $n-1/4$ to arbitrarily fast rates, depending on a parameter characterizing the hardness of the problem.
arXiv Detail & Related papers (2021-02-01T10:50:04Z) - Sharp Statistical Guarantees for Adversarially Robust Gaussian
Classification [54.22421582955454]
We provide the first result of the optimal minimax guarantees for the excess risk for adversarially robust classification.
Results are stated in terms of the Adversarial Signal-to-Noise Ratio (AdvSNR), which generalizes a similar notion for standard linear classification to the adversarial setting.
arXiv Detail & Related papers (2020-06-29T21:06:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.