Related papers: Understanding Generalization in Adversarial Training via the Bias-Variance Decomposition

Understanding Generalization in Adversarial Training via the Bias-Variance Decomposition

URL: http://arxiv.org/abs/2103.09947v1
Date: Wed, 17 Mar 2021 23:30:00 GMT
Title: Understanding Generalization in Adversarial Training via the Bias-Variance Decomposition
Authors: Yaodong Yu, Zitong Yang, Edgar Dobriban, Jacob Steinhardt, Yi Ma
Abstract summary: We decompose the test risk into its bias and variance components. We find that the bias increases monotonically with perturbation size and is the dominant term in the risk. We show that popular explanations for the generalization gap instead predict the variance to be monotonic.
Score: 39.108491135488286
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Adversarially trained models exhibit a large generalization gap: they can interpolate the training set even for large perturbation radii, but at the cost of large test error on clean samples. To investigate this gap, we decompose the test risk into its bias and variance components. We find that the bias increases monotonically with perturbation size and is the dominant term in the risk. Meanwhile, the variance is unimodal, peaking near the interpolation threshold for the training set. In contrast, we show that popular explanations for the generalization gap instead predict the variance to be monotonic, which leaves an unresolved mystery. We show that the same unimodal variance appears in a simple high-dimensional logistic regression problem, as well as for randomized smoothing. Overall, our results highlight the power of bias-variance decompositions in modern settings--by providing two measurements instead of one, they can rule out some theories and clarify others.

Related papers

Generalizing to any diverse distribution: uniformity, gentle finetuning and rebalancing [55.791818510796645]
We aim to develop models that generalize well to any diverse test distribution, even if the latter deviates significantly from the training data. Various approaches like domain adaptation, domain generalization, and robust optimization attempt to address the out-of-distribution challenge. We adopt a more conservative perspective by accounting for the worst-case error across all sufficiently diverse test distributions within a known domain.
arXiv Detail & Related papers (2024-10-08T12:26:48Z)
It's an Alignment, Not a Trade-off: Revisiting Bias and Variance in Deep Models [51.66015254740692]
We show that for an ensemble of deep learning based classification models, bias and variance are emphaligned at a sample level. We study this phenomenon from two theoretical perspectives: calibration and neural collapse.
arXiv Detail & Related papers (2023-10-13T17:06:34Z)
On the Strong Correlation Between Model Invariance and Generalization [54.812786542023325]
Generalization captures a model's ability to classify unseen data. Invariance measures consistency of model predictions on transformations of the data. From a dataset-centric view, we find a certain model's accuracy and invariance linearly correlated on different test sets.
arXiv Detail & Related papers (2022-07-14T17:08:25Z)
Ensembling over Classifiers: a Bias-Variance Perspective [13.006468721874372]
We build upon the extension to the bias-variance decomposition by Pfau (2013) in order to gain crucial insights into the behavior of ensembles of classifiers. We show that conditional estimates necessarily incur an irreducible error. Empirically, standard ensembling reducesthe bias, leading us to hypothesize that ensembles of classifiers may perform well in part because of this unexpected reduction.
arXiv Detail & Related papers (2022-06-21T17:46:35Z)
Fluctuations, Bias, Variance & Ensemble of Learners: Exact Asymptotics for Convex Losses in High-Dimension [25.711297863946193]
We develop a theory for the study of fluctuations in an ensemble of generalised linear models trained on different, but correlated, features. We provide a complete description of the joint distribution of the empirical risk minimiser for generic convex loss and regularisation in the high-dimensional limit.
arXiv Detail & Related papers (2022-01-31T17:44:58Z)
Optimization Variance: Exploring Generalization Properties of DNNs [83.78477167211315]
The test error of a deep neural network (DNN) often demonstrates double descent. We propose a novel metric, optimization variance (OV), to measure the diversity of model updates.
arXiv Detail & Related papers (2021-06-03T09:34:17Z)
Understanding Double Descent Requires a Fine-Grained Bias-Variance Decomposition [34.235007566913396]
We describe an interpretable, symmetric decomposition of the variance into terms associated with the labels. We find that the bias decreases monotonically with the network width, but the variance terms exhibit non-monotonic behavior. We also analyze the strikingly rich phenomenology that arises.
arXiv Detail & Related papers (2020-11-04T21:04:02Z)
What causes the test error? Going beyond bias-variance via ANOVA [21.359033212191218]
Modern machine learning methods are often overparametrized, allowing adaptation to the data at a fine level. Recent work aimed to understand in greater depth why overparametrization is helpful for generalization. We propose using the analysis of variance (ANOVA) to decompose the variance in the test error in a symmetric way.
arXiv Detail & Related papers (2020-10-11T05:21:13Z)
GANs with Variational Entropy Regularizers: Applications in Mitigating the Mode-Collapse Issue [95.23775347605923]
Building on the success of deep learning, Generative Adversarial Networks (GANs) provide a modern approach to learn a probability distribution from observed samples. GANs often suffer from the mode collapse issue where the generator fails to capture all existing modes of the input distribution. We take an information-theoretic approach and maximize a variational lower bound on the entropy of the generated samples to increase their diversity.
arXiv Detail & Related papers (2020-09-24T19:34:37Z)
Rethinking Bias-Variance Trade-off for Generalization of Neural Networks [40.04927952870877]
We provide a simple explanation for this by measuring the bias and variance of neural networks. We find that variance unimodality occurs robustly for all models we considered. Deeper models decrease bias and increase variance for both in-distribution and out-of-distribution data.
arXiv Detail & Related papers (2020-02-26T07:21:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.