Foolish Crowds Support Benign Overfitting
- URL: http://arxiv.org/abs/2110.02914v1
- Date: Wed, 6 Oct 2021 16:56:37 GMT
- Title: Foolish Crowds Support Benign Overfitting
- Authors: Niladri S. Chatterji and Philip M. Long
- Abstract summary: We prove a lower bound on the excess risk of sparse interpolating procedures for linear regression with Gaussian data.
Our analysis exposes the benefit of the "wisdom of the crowd", except here the harm arising from fitting the noise is ameliorated by spreading it among many directions.
- Score: 20.102619493827024
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We prove a lower bound on the excess risk of sparse interpolating procedures
for linear regression with Gaussian data in the overparameterized regime. We
work in a setting where the covariance structure has previously been shown to
be compatible with benign overfitting with fast convergence to the Bayes risk.
We apply the general bound to obtain a lower bound for basis pursuit (the
minimum $\ell_1$-norm interpolant) that implies that its excess risk can
converge at an exponentially slower rate than OLS (the minimum $\ell_2$-norm
interpolant), even when the ground truth is sparse. Our analysis exposes the
benefit of an effect analogous to the "wisdom of the crowd", except here the
harm arising from fitting the noise is ameliorated by spreading it among many
directions - the variance reduction arises from a foolish crowd.
Related papers
- Minimax Linear Regression under the Quantile Risk [31.277788690403522]
We study the problem of designing minimax procedures in linear regression under the quantile risk.
We prove a matching upper bound on the worst-case quantile risk of a variant of the recently proposed min-max regression procedure.
arXiv Detail & Related papers (2024-06-17T23:24:14Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy
Optimization [63.32053223422317]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
In particular, we focus on characterizing the variance over values induced by a distribution over MDPs.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - Closing the Gap Between the Upper Bound and the Lower Bound of Adam's
Iteration Complexity [51.96093077151991]
We derive a new convergence guarantee of Adam, with only an $L$-smooth condition and a bounded noise variance assumption.
Our proof utilizes novel techniques to handle the entanglement between momentum and adaptive learning rate.
arXiv Detail & Related papers (2023-10-27T09:16:58Z) - Distributionally Robust Optimization with Bias and Variance Reduction [9.341215359733601]
We show that Prospect, a gradient-based algorithm, enjoys linear convergence for smooth regularized losses.
We also show that Prospect can converge 2-3$times$ faster than baselines such as gradient-based methods.
arXiv Detail & Related papers (2023-10-21T00:03:54Z) - Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits [56.457634640638254]
This paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM)
We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $tilde Obig(dsqrtsum_t=1Tsigma_t2 + dbig)$.
Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $tilde O(d)$ regret.
arXiv Detail & Related papers (2023-10-02T08:15:52Z) - Ensemble linear interpolators: The role of ensembling [5.135730286836428]
Interpolators are unstable, for example, the mininum $ell$ norm least square interpolator exhibits test errors when dealing with noisy data.
We study how ensemble stabilizes and thus improves the unbounded performance, measured by the out-of-sample prediction risk, of an individual interpolator.
arXiv Detail & Related papers (2023-09-06T20:38:04Z) - Batches Stabilize the Minimum Norm Risk in High Dimensional
Overparameterized Linear Regression [21.83136833217205]
We show that batch partitioning offers useful trade-offs between computational efficiency and performance.
We suggest a natural small-batch version of the minimum-norm estimator, and derive an upper bound on its quadratic risk.
Our bound is derived via a novel combination of techniques, in particular normal approximation in the Wasserstein metric of noisy projections over random subspaces.
arXiv Detail & Related papers (2023-06-14T11:02:08Z) - Tighter PAC-Bayes Generalisation Bounds by Leveraging Example Difficulty [5.799808780731661]
We introduce a modified version of the excess risk.
It can be used to obtain tighter, fast-rate PAC-Bayesian generalisation bounds.
We empirically evaluate these new bounds on a number of real-world datasets.
arXiv Detail & Related papers (2022-10-20T14:14:52Z) - Benign Underfitting of Stochastic Gradient Descent [72.38051710389732]
We study to what extent may gradient descent (SGD) be understood as a "conventional" learning rule that achieves generalization performance by obtaining a good fit training data.
We analyze the closely related with-replacement SGD, for which an analogous phenomenon does not occur and prove that its population risk does in fact converge at the optimal rate.
arXiv Detail & Related papers (2022-02-27T13:25:01Z) - Interpolation can hurt robust generalization even when there is no noise [76.3492338989419]
We show that avoiding generalization through ridge regularization can significantly improve generalization even in the absence of noise.
We prove this phenomenon for the robust risk of both linear regression and classification and hence provide the first theoretical result on robust overfitting.
arXiv Detail & Related papers (2021-08-05T23:04:15Z) - Sharp Statistical Guarantees for Adversarially Robust Gaussian
Classification [54.22421582955454]
We provide the first result of the optimal minimax guarantees for the excess risk for adversarially robust classification.
Results are stated in terms of the Adversarial Signal-to-Noise Ratio (AdvSNR), which generalizes a similar notion for standard linear classification to the adversarial setting.
arXiv Detail & Related papers (2020-06-29T21:06:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.