What causes the test error? Going beyond bias-variance via ANOVA
- URL: http://arxiv.org/abs/2010.05170v3
- Date: Wed, 9 Jun 2021 06:46:33 GMT
- Title: What causes the test error? Going beyond bias-variance via ANOVA
- Authors: Licong Lin, Edgar Dobriban
- Abstract summary: Modern machine learning methods are often overparametrized, allowing adaptation to the data at a fine level.
Recent work aimed to understand in greater depth why overparametrization is helpful for generalization.
We propose using the analysis of variance (ANOVA) to decompose the variance in the test error in a symmetric way.
- Score: 21.359033212191218
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern machine learning methods are often overparametrized, allowing
adaptation to the data at a fine level. This can seem puzzling; in the worst
case, such models do not need to generalize. This puzzle inspired a great
amount of work, arguing when overparametrization reduces test error, in a
phenomenon called "double descent". Recent work aimed to understand in greater
depth why overparametrization is helpful for generalization. This leads to
discovering the unimodality of variance as a function of the level of
parametrization, and to decomposing the variance into that arising from label
noise, initialization, and randomness in the training data to understand the
sources of the error.
In this work we develop a deeper understanding of this area. Specifically, we
propose using the analysis of variance (ANOVA) to decompose the variance in the
test error in a symmetric way, for studying the generalization performance of
certain two-layer linear and non-linear networks. The advantage of the analysis
of variance is that it reveals the effects of initialization, label noise, and
training data more clearly than prior approaches. Moreover, we also study the
monotonicity and unimodality of the variance components. While prior work
studied the unimodality of the overall variance, we study the properties of
each term in variance decomposition.
One key insight is that in typical settings, the interaction between training
samples and initialization can dominate the variance; surprisingly being larger
than their marginal effect. Also, we characterize "phase transitions" where the
variance changes from unimodal to monotone. On a technical level, we leverage
advanced deterministic equivalent techniques for Haar random matrices, that --
to our knowledge -- have not yet been used in the area. We also verify our
results in numerical simulations and on empirical data examples.
Related papers
- In What Ways Are Deep Neural Networks Invariant and How Should We
Measure This? [5.757836174655293]
We introduce a family of invariance and equivariance metrics that allows us to quantify these properties in a way that disentangles them from other metrics such as loss or accuracy.
We draw a range of conclusions about invariance and equivariance in deep learning models, ranging from whether initializing a model with pretrained weights has an effect on a trained model's invariance, to the extent to which invariance learned via training can generalize to out-of-distribution data.
arXiv Detail & Related papers (2022-10-07T18:43:21Z) - Equivariance and Invariance Inductive Bias for Learning from
Insufficient Data [65.42329520528223]
We show why insufficient data renders the model more easily biased to the limited training environments that are usually different from testing.
We propose a class-wise invariant risk minimization (IRM) that efficiently tackles the challenge of missing environmental annotation in conventional IRM.
arXiv Detail & Related papers (2022-07-25T15:26:19Z) - On the Strong Correlation Between Model Invariance and Generalization [54.812786542023325]
Generalization captures a model's ability to classify unseen data.
Invariance measures consistency of model predictions on transformations of the data.
From a dataset-centric view, we find a certain model's accuracy and invariance linearly correlated on different test sets.
arXiv Detail & Related papers (2022-07-14T17:08:25Z) - Equivariance Discovery by Learned Parameter-Sharing [153.41877129746223]
We study how to discover interpretable equivariances from data.
Specifically, we formulate this discovery process as an optimization problem over a model's parameter-sharing schemes.
Also, we theoretically analyze the method for Gaussian data and provide a bound on the mean squared gap between the studied discovery scheme and the oracle scheme.
arXiv Detail & Related papers (2022-04-07T17:59:19Z) - Regularising for invariance to data augmentation improves supervised
learning [82.85692486314949]
We show that using multiple augmentations per input can improve generalisation.
We propose an explicit regulariser that encourages this invariance on the level of individual model predictions.
arXiv Detail & Related papers (2022-03-07T11:25:45Z) - Optimization Variance: Exploring Generalization Properties of DNNs [83.78477167211315]
The test error of a deep neural network (DNN) often demonstrates double descent.
We propose a novel metric, optimization variance (OV), to measure the diversity of model updates.
arXiv Detail & Related papers (2021-06-03T09:34:17Z) - Understanding Generalization in Adversarial Training via the
Bias-Variance Decomposition [39.108491135488286]
We decompose the test risk into its bias and variance components.
We find that the bias increases monotonically with perturbation size and is the dominant term in the risk.
We show that popular explanations for the generalization gap instead predict the variance to be monotonic.
arXiv Detail & Related papers (2021-03-17T23:30:00Z) - Memorizing without overfitting: Bias, variance, and interpolation in
over-parameterized models [0.0]
The bias-variance trade-off is a central concept in supervised learning.
Modern Deep Learning methods flout this dogma, achieving state-of-the-art performance.
arXiv Detail & Related papers (2020-10-26T22:31:04Z) - Learning Invariances in Neural Networks [51.20867785006147]
We show how to parameterize a distribution over augmentations and optimize the training loss simultaneously with respect to the network parameters and augmentation parameters.
We can recover the correct set and extent of invariances on image classification, regression, segmentation, and molecular property prediction from a large space of augmentations.
arXiv Detail & Related papers (2020-10-22T17:18:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.