Rethinking Bias-Variance Trade-off for Generalization of Neural Networks
- URL: http://arxiv.org/abs/2002.11328v3
- Date: Tue, 8 Dec 2020 03:10:44 GMT
- Title: Rethinking Bias-Variance Trade-off for Generalization of Neural Networks
- Authors: Zitong Yang, Yaodong Yu, Chong You, Jacob Steinhardt, Yi Ma
- Abstract summary: We provide a simple explanation for this by measuring the bias and variance of neural networks.
We find that variance unimodality occurs robustly for all models we considered.
Deeper models decrease bias and increase variance for both in-distribution and out-of-distribution data.
- Score: 40.04927952870877
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The classical bias-variance trade-off predicts that bias decreases and
variance increase with model complexity, leading to a U-shaped risk curve.
Recent work calls this into question for neural networks and other
over-parameterized models, for which it is often observed that larger models
generalize better. We provide a simple explanation for this by measuring the
bias and variance of neural networks: while the bias is monotonically
decreasing as in the classical theory, the variance is unimodal or bell-shaped:
it increases then decreases with the width of the network. We vary the network
architecture, loss function, and choice of dataset and confirm that variance
unimodality occurs robustly for all models we considered. The risk curve is the
sum of the bias and variance curves and displays different qualitative shapes
depending on the relative scale of bias and variance, with the double descent
curve observed in recent literature as a special case. We corroborate these
empirical results with a theoretical analysis of two-layer linear networks with
random first layer. Finally, evaluation on out-of-distribution data shows that
most of the drop in accuracy comes from increased bias while variance increases
by a relatively small amount. Moreover, we find that deeper models decrease
bias and increase variance for both in-distribution and out-of-distribution
data.
Related papers
- A U-turn on Double Descent: Rethinking Parameter Counting in Statistical
Learning [68.76846801719095]
We show that double descent appears exactly when and where it occurs, and that its location is not inherently tied to the threshold p=n.
This provides a resolution to tensions between double descent and statistical intuition.
arXiv Detail & Related papers (2023-10-29T12:05:39Z) - It's an Alignment, Not a Trade-off: Revisiting Bias and Variance in Deep
Models [51.66015254740692]
We show that for an ensemble of deep learning based classification models, bias and variance are emphaligned at a sample level.
We study this phenomenon from two theoretical perspectives: calibration and neural collapse.
arXiv Detail & Related papers (2023-10-13T17:06:34Z) - Bias in Pruned Vision Models: In-Depth Analysis and Countermeasures [93.17009514112702]
Pruning, setting a significant subset of the parameters of a neural network to zero, is one of the most popular methods of model compression.
Despite existing evidence for this phenomenon, the relationship between neural network pruning and induced bias is not well-understood.
arXiv Detail & Related papers (2023-04-25T07:42:06Z) - On the Strong Correlation Between Model Invariance and Generalization [54.812786542023325]
Generalization captures a model's ability to classify unseen data.
Invariance measures consistency of model predictions on transformations of the data.
From a dataset-centric view, we find a certain model's accuracy and invariance linearly correlated on different test sets.
arXiv Detail & Related papers (2022-07-14T17:08:25Z) - Fluctuations, Bias, Variance & Ensemble of Learners: Exact Asymptotics
for Convex Losses in High-Dimension [25.711297863946193]
We develop a theory for the study of fluctuations in an ensemble of generalised linear models trained on different, but correlated, features.
We provide a complete description of the joint distribution of the empirical risk minimiser for generic convex loss and regularisation in the high-dimensional limit.
arXiv Detail & Related papers (2022-01-31T17:44:58Z) - Optimization Variance: Exploring Generalization Properties of DNNs [83.78477167211315]
The test error of a deep neural network (DNN) often demonstrates double descent.
We propose a novel metric, optimization variance (OV), to measure the diversity of model updates.
arXiv Detail & Related papers (2021-06-03T09:34:17Z) - Understanding Generalization in Adversarial Training via the
Bias-Variance Decomposition [39.108491135488286]
We decompose the test risk into its bias and variance components.
We find that the bias increases monotonically with perturbation size and is the dominant term in the risk.
We show that popular explanations for the generalization gap instead predict the variance to be monotonic.
arXiv Detail & Related papers (2021-03-17T23:30:00Z) - Memorizing without overfitting: Bias, variance, and interpolation in
over-parameterized models [0.0]
The bias-variance trade-off is a central concept in supervised learning.
Modern Deep Learning methods flout this dogma, achieving state-of-the-art performance.
arXiv Detail & Related papers (2020-10-26T22:31:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.