Understanding Generalization in Adversarial Training via the
Bias-Variance Decomposition
- URL: http://arxiv.org/abs/2103.09947v1
- Date: Wed, 17 Mar 2021 23:30:00 GMT
- Title: Understanding Generalization in Adversarial Training via the
Bias-Variance Decomposition
- Authors: Yaodong Yu, Zitong Yang, Edgar Dobriban, Jacob Steinhardt, Yi Ma
- Abstract summary: We decompose the test risk into its bias and variance components.
We find that the bias increases monotonically with perturbation size and is the dominant term in the risk.
We show that popular explanations for the generalization gap instead predict the variance to be monotonic.
- Score: 39.108491135488286
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarially trained models exhibit a large generalization gap: they can
interpolate the training set even for large perturbation radii, but at the cost
of large test error on clean samples. To investigate this gap, we decompose the
test risk into its bias and variance components. We find that the bias
increases monotonically with perturbation size and is the dominant term in the
risk. Meanwhile, the variance is unimodal, peaking near the interpolation
threshold for the training set. In contrast, we show that popular explanations
for the generalization gap instead predict the variance to be monotonic, which
leaves an unresolved mystery. We show that the same unimodal variance appears
in a simple high-dimensional logistic regression problem, as well as for
randomized smoothing. Overall, our results highlight the power of bias-variance
decompositions in modern settings--by providing two measurements instead of
one, they can rule out some theories and clarify others.
Related papers
- Generalizing to any diverse distribution: uniformity, gentle finetuning and rebalancing [55.791818510796645]
We aim to develop models that generalize well to any diverse test distribution, even if the latter deviates significantly from the training data.
Various approaches like domain adaptation, domain generalization, and robust optimization attempt to address the out-of-distribution challenge.
We adopt a more conservative perspective by accounting for the worst-case error across all sufficiently diverse test distributions within a known domain.
arXiv Detail & Related papers (2024-10-08T12:26:48Z) - It's an Alignment, Not a Trade-off: Revisiting Bias and Variance in Deep
Models [51.66015254740692]
We show that for an ensemble of deep learning based classification models, bias and variance are emphaligned at a sample level.
We study this phenomenon from two theoretical perspectives: calibration and neural collapse.
arXiv Detail & Related papers (2023-10-13T17:06:34Z) - On the Strong Correlation Between Model Invariance and Generalization [54.812786542023325]
Generalization captures a model's ability to classify unseen data.
Invariance measures consistency of model predictions on transformations of the data.
From a dataset-centric view, we find a certain model's accuracy and invariance linearly correlated on different test sets.
arXiv Detail & Related papers (2022-07-14T17:08:25Z) - Ensembling over Classifiers: a Bias-Variance Perspective [13.006468721874372]
We build upon the extension to the bias-variance decomposition by Pfau (2013) in order to gain crucial insights into the behavior of ensembles of classifiers.
We show that conditional estimates necessarily incur an irreducible error.
Empirically, standard ensembling reducesthe bias, leading us to hypothesize that ensembles of classifiers may perform well in part because of this unexpected reduction.
arXiv Detail & Related papers (2022-06-21T17:46:35Z) - Fluctuations, Bias, Variance & Ensemble of Learners: Exact Asymptotics
for Convex Losses in High-Dimension [25.711297863946193]
We develop a theory for the study of fluctuations in an ensemble of generalised linear models trained on different, but correlated, features.
We provide a complete description of the joint distribution of the empirical risk minimiser for generic convex loss and regularisation in the high-dimensional limit.
arXiv Detail & Related papers (2022-01-31T17:44:58Z) - Optimization Variance: Exploring Generalization Properties of DNNs [83.78477167211315]
The test error of a deep neural network (DNN) often demonstrates double descent.
We propose a novel metric, optimization variance (OV), to measure the diversity of model updates.
arXiv Detail & Related papers (2021-06-03T09:34:17Z) - Understanding Double Descent Requires a Fine-Grained Bias-Variance
Decomposition [34.235007566913396]
We describe an interpretable, symmetric decomposition of the variance into terms associated with the labels.
We find that the bias decreases monotonically with the network width, but the variance terms exhibit non-monotonic behavior.
We also analyze the strikingly rich phenomenology that arises.
arXiv Detail & Related papers (2020-11-04T21:04:02Z) - What causes the test error? Going beyond bias-variance via ANOVA [21.359033212191218]
Modern machine learning methods are often overparametrized, allowing adaptation to the data at a fine level.
Recent work aimed to understand in greater depth why overparametrization is helpful for generalization.
We propose using the analysis of variance (ANOVA) to decompose the variance in the test error in a symmetric way.
arXiv Detail & Related papers (2020-10-11T05:21:13Z) - GANs with Variational Entropy Regularizers: Applications in Mitigating
the Mode-Collapse Issue [95.23775347605923]
Building on the success of deep learning, Generative Adversarial Networks (GANs) provide a modern approach to learn a probability distribution from observed samples.
GANs often suffer from the mode collapse issue where the generator fails to capture all existing modes of the input distribution.
We take an information-theoretic approach and maximize a variational lower bound on the entropy of the generated samples to increase their diversity.
arXiv Detail & Related papers (2020-09-24T19:34:37Z) - Rethinking Bias-Variance Trade-off for Generalization of Neural Networks [40.04927952870877]
We provide a simple explanation for this by measuring the bias and variance of neural networks.
We find that variance unimodality occurs robustly for all models we considered.
Deeper models decrease bias and increase variance for both in-distribution and out-of-distribution data.
arXiv Detail & Related papers (2020-02-26T07:21:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.