The Implicit Bias of Benign Overfitting
- URL: http://arxiv.org/abs/2201.11489v5
- Date: Mon, 17 Apr 2023 08:36:15 GMT
- Title: The Implicit Bias of Benign Overfitting
- Authors: Ohad Shamir
- Abstract summary: benign overfitting is where a predictor perfectly fits noisy training data while attaining near-optimal expected loss.
We show how this can be extended beyond standard linear regression.
We then turn to classification problems, and show that the situation there is much more favorable.
- Score: 31.714928102950584
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The phenomenon of benign overfitting, where a predictor perfectly fits noisy
training data while attaining near-optimal expected loss, has received much
attention in recent years, but still remains not fully understood beyond
well-specified linear regression setups. In this paper, we provide several new
results on when one can or cannot expect benign overfitting to occur, for both
regression and classification tasks. We consider a prototypical and rather
generic data model for benign overfitting of linear predictors, where an
arbitrary input distribution of some fixed dimension $k$ is concatenated with a
high-dimensional distribution. For linear regression which is not necessarily
well-specified, we show that the minimum-norm interpolating predictor (that
standard training methods converge to) is biased towards an inconsistent
solution in general, hence benign overfitting will generally not occur.
Moreover, we show how this can be extended beyond standard linear regression,
by an argument proving how the existence of benign overfitting on some
regression problems precludes its existence on other regression problems. We
then turn to classification problems, and show that the situation there is much
more favorable. Specifically, we prove that the max-margin predictor (to which
standard training methods are known to converge in direction) is asymptotically
biased towards minimizing a weighted \emph{squared hinge loss}. This allows us
to reduce the question of benign overfitting in classification to the simpler
question of whether this loss is a good surrogate for the misclassification
error, and use it to show benign overfitting in some new settings.
Related papers
- Regularized Linear Regression for Binary Classification [20.710343135282116]
Regularized linear regression is a promising approach for binary classification problems in which the training set has noisy labels.
We show that for large enough regularization strength, the optimal weights concentrate around two values of opposite sign.
We observe that in many cases the corresponding "compression" of each weight to a single bit leads to very little loss in performance.
arXiv Detail & Related papers (2023-11-03T23:18:21Z) - Uncertainty Voting Ensemble for Imbalanced Deep Regression [20.176217123752465]
In this paper, we introduce UVOTE, a method for learning from imbalanced data.
We replace traditional regression losses with negative log-likelihood, which also predicts sample-wise aleatoric uncertainty.
We show that UVOTE consistently outperforms the prior art, while at the same time producing better-calibrated uncertainty estimates.
arXiv Detail & Related papers (2023-05-24T14:12:21Z) - Benign-Overfitting in Conditional Average Treatment Effect Prediction
with Linear Regression [14.493176427999028]
We study the benign overfitting theory in the prediction of the conditional average treatment effect (CATE) with linear regression models.
We show that the T-learner fails to achieve the consistency except the random assignment, while the IPW-learner converges the risk to zero if the propensity score is known.
arXiv Detail & Related papers (2022-02-10T18:51:52Z) - Benign Overfitting in Adversarially Robust Linear Classification [91.42259226639837]
"Benign overfitting", where classifiers memorize noisy training data yet still achieve a good generalization performance, has drawn great attention in the machine learning community.
We show that benign overfitting indeed occurs in adversarial training, a principled approach to defend against adversarial examples.
arXiv Detail & Related papers (2021-12-31T00:27:31Z) - Interpolation can hurt robust generalization even when there is no noise [76.3492338989419]
We show that avoiding generalization through ridge regularization can significantly improve generalization even in the absence of noise.
We prove this phenomenon for the robust risk of both linear regression and classification and hence provide the first theoretical result on robust overfitting.
arXiv Detail & Related papers (2021-08-05T23:04:15Z) - Near-optimal inference in adaptive linear regression [60.08422051718195]
Even simple methods like least squares can exhibit non-normal behavior when data is collected in an adaptive manner.
We propose a family of online debiasing estimators to correct these distributional anomalies in at least squares estimation.
We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.
arXiv Detail & Related papers (2021-07-05T21:05:11Z) - Risk Bounds for Over-parameterized Maximum Margin Classification on
Sub-Gaussian Mixtures [100.55816326422773]
We study the phenomenon of the maximum margin classifier for linear classification problems.
Our results precisely characterize the condition under which benign overfitting can occur.
arXiv Detail & Related papers (2021-04-28T08:25:16Z) - Learning Probabilistic Ordinal Embeddings for Uncertainty-Aware
Regression [91.3373131262391]
Uncertainty is the only certainty there is.
Traditionally, the direct regression formulation is considered and the uncertainty is modeled by modifying the output space to a certain family of probabilistic distributions.
How to model the uncertainty within the present-day technologies for regression remains an open issue.
arXiv Detail & Related papers (2021-03-25T06:56:09Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Online and Distribution-Free Robustness: Regression and Contextual
Bandits with Huber Contamination [29.85468294601847]
We revisit two classic high-dimensional online learning problems, namely linear regression and contextual bandits.
We show that our algorithms succeed where conventional methods fail.
arXiv Detail & Related papers (2020-10-08T17:59:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.