A Note on "Assessing Generalization of SGD via Disagreement"
- URL: http://arxiv.org/abs/2202.01851v1
- Date: Thu, 3 Feb 2022 21:23:34 GMT
- Title: A Note on "Assessing Generalization of SGD via Disagreement"
- Authors: Andreas Kirsch, Yarin Gal
- Abstract summary: We show that the approach suggested might be impractical because a deep ensemble's calibration deteriorates under distribution shift.
The proposed calibration metrics are also equivalent to two metrics introduced by Nixon et al.: 'ACE' and 'SCE'
- Score: 38.59619544501593
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Jiang et al. (2021) give empirical evidence that the average test error of
deep neural networks can be estimated via the prediction disagreement of two
separately trained networks. They also provide a theoretical explanation that
this 'Generalization Disagreement Equality' follows from the well-calibrated
nature of deep ensembles under the notion of a proposed 'class-aggregated
calibration'. In this paper we show that the approach suggested might be
impractical because a deep ensemble's calibration deteriorates under
distribution shift, which is exactly when the coupling of test error and
disagreement would be of practical value. We present both theoretical and
experimental evidence, re-deriving the theoretical statements using a simple
Bayesian perspective and show them to be straightforward and more generic: they
apply to any discriminative model -- not only ensembles whose members output
one-hot class predictions. The proposed calibration metrics are also equivalent
to two metrics introduced by Nixon et al. (2019): 'ACE' and 'SCE'.
Related papers
- MANO: Exploiting Matrix Norm for Unsupervised Accuracy Estimation Under Distribution Shifts [25.643876327918544]
Leveraging the models' outputs, specifically the logits, is a common approach to estimating the test accuracy of a pre-trained neural network on out-of-distribution samples.
Despite their ease of implementation and computational efficiency, current logit-based methods are vulnerable to overconfidence issues, leading to prediction bias.
We propose MaNo which applies a data-dependent normalization on the logits to reduce prediction bias and takes the $L_p$ norm of the matrix of normalized logits as the estimation score.
arXiv Detail & Related papers (2024-05-29T10:45:06Z) - It's an Alignment, Not a Trade-off: Revisiting Bias and Variance in Deep
Models [51.66015254740692]
We show that for an ensemble of deep learning based classification models, bias and variance are emphaligned at a sample level.
We study this phenomenon from two theoretical perspectives: calibration and neural collapse.
arXiv Detail & Related papers (2023-10-13T17:06:34Z) - Synergies between Disentanglement and Sparsity: Generalization and
Identifiability in Multi-Task Learning [79.83792914684985]
We prove a new identifiability result that provides conditions under which maximally sparse base-predictors yield disentangled representations.
Motivated by this theoretical result, we propose a practical approach to learn disentangled representations based on a sparsity-promoting bi-level optimization problem.
arXiv Detail & Related papers (2022-11-26T21:02:09Z) - Ensembling over Classifiers: a Bias-Variance Perspective [13.006468721874372]
We build upon the extension to the bias-variance decomposition by Pfau (2013) in order to gain crucial insights into the behavior of ensembles of classifiers.
We show that conditional estimates necessarily incur an irreducible error.
Empirically, standard ensembling reducesthe bias, leading us to hypothesize that ensembles of classifiers may perform well in part because of this unexpected reduction.
arXiv Detail & Related papers (2022-06-21T17:46:35Z) - Assessing Generalization of SGD via Disagreement [71.17788927037081]
We empirically show that the test error of deep networks can be estimated by simply training the same architecture on the same training set but with a different run of Gradient Descent (SGD)
This finding not only provides a simple empirical measure to directly predict the test error using unlabeled test data, but also establishes a new conceptual connection between generalization and calibration.
arXiv Detail & Related papers (2021-06-25T17:53:09Z) - Predicting Unreliable Predictions by Shattering a Neural Network [145.3823991041987]
Piecewise linear neural networks can be split into subfunctions.
Subfunctions have their own activation pattern, domain, and empirical error.
Empirical error for the full network can be written as an expectation over subfunctions.
arXiv Detail & Related papers (2021-06-15T18:34:41Z) - Predicting Deep Neural Network Generalization with Perturbation Response
Curves [58.8755389068888]
We propose a new framework for evaluating the generalization capabilities of trained networks.
Specifically, we introduce two new measures for accurately predicting generalization gaps.
We attain better predictive scores than the current state-of-the-art measures on a majority of tasks in the Predicting Generalization in Deep Learning (PGDL) NeurIPS 2020 competition.
arXiv Detail & Related papers (2021-06-09T01:37:36Z) - Should Ensemble Members Be Calibrated? [16.331175260764]
Modern deep neural networks are often observed to be poorly calibrated.
Deep learning approaches make use of large numbers of model parameters.
This paper explores the application of calibration schemes to deep ensembles.
arXiv Detail & Related papers (2021-01-13T23:59:00Z) - Generalised Lipschitz Regularisation Equals Distributional Robustness [47.44261811369141]
We give a very general equality result regarding the relationship between distributional robustness and regularisation.
We show a new result explicating the connection between adversarial learning and distributional robustness.
arXiv Detail & Related papers (2020-02-11T04:19:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.