Assessing Generalization of SGD via Disagreement
- URL: http://arxiv.org/abs/2106.13799v1
- Date: Fri, 25 Jun 2021 17:53:09 GMT
- Title: Assessing Generalization of SGD via Disagreement
- Authors: Yiding Jiang, Vaishnavh Nagarajan, Christina Baek, J. Zico Kolter
- Abstract summary: We empirically show that the test error of deep networks can be estimated by simply training the same architecture on the same training set but with a different run of Gradient Descent (SGD)
This finding not only provides a simple empirical measure to directly predict the test error using unlabeled test data, but also establishes a new conceptual connection between generalization and calibration.
- Score: 71.17788927037081
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We empirically show that the test error of deep networks can be estimated by
simply training the same architecture on the same training set but with a
different run of Stochastic Gradient Descent (SGD), and measuring the
disagreement rate between the two networks on unlabeled test data. This builds
on -- and is a stronger version of -- the observation in Nakkiran & Bansal '20,
which requires the second run to be on an altogether fresh training set. We
further theoretically show that this peculiar phenomenon arises from the
\emph{well-calibrated} nature of \emph{ensembles} of SGD-trained models. This
finding not only provides a simple empirical measure to directly predict the
test error using unlabeled test data, but also establishes a new conceptual
connection between generalization and calibration.
Related papers
- Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple
Logits Retargeting Approach [102.0769560460338]
We develop a simple logits approach (LORT) without the requirement of prior knowledge of the number of samples per class.
Our method achieves state-of-the-art performance on various imbalanced datasets, including CIFAR100-LT, ImageNet-LT, and iNaturalist 2018.
arXiv Detail & Related papers (2024-03-01T03:27:08Z) - Selective Learning: Towards Robust Calibration with Dynamic Regularization [79.92633587914659]
Miscalibration in deep learning refers to there is a discrepancy between the predicted confidence and performance.
We introduce Dynamic Regularization (DReg) which aims to learn what should be learned during training thereby circumventing the confidence adjusting trade-off.
arXiv Detail & Related papers (2024-02-13T11:25:20Z) - On the Variance of Neural Network Training with respect to Test Sets and Distributions [1.994307489466967]
We show that standard CIFAR-10 and ImageNet trainings have little variance in performance on the underlying test-distributions.
We prove that the variance of neural network trainings on their test-sets is a downstream consequence of the class-calibration property discovered by Jiang et al.
Our analysis yields a simple formula which accurately predicts variance for the classification case.
arXiv Detail & Related papers (2023-04-04T16:09:55Z) - Explicit Tradeoffs between Adversarial and Natural Distributional
Robustness [48.44639585732391]
In practice, models need to enjoy both types of robustness to ensure reliability.
In this work, we show that in fact, explicit tradeoffs exist between adversarial and natural distributional robustness.
arXiv Detail & Related papers (2022-09-15T19:58:01Z) - A Note on "Assessing Generalization of SGD via Disagreement" [38.59619544501593]
We show that the approach suggested might be impractical because a deep ensemble's calibration deteriorates under distribution shift.
The proposed calibration metrics are also equivalent to two metrics introduced by Nixon et al.: 'ACE' and 'SCE'
arXiv Detail & Related papers (2022-02-03T21:23:34Z) - Training on Test Data with Bayesian Adaptation for Covariate Shift [96.3250517412545]
Deep neural networks often make inaccurate predictions with unreliable uncertainty estimates.
We derive a Bayesian model that provides for a well-defined relationship between unlabeled inputs under distributional shift and model parameters.
We show that our method improves both accuracy and uncertainty estimation.
arXiv Detail & Related papers (2021-09-27T01:09:08Z) - Mutual Supervision for Dense Object Detection [37.30539436044029]
We propose a novel supervisory paradigm, termed as Mutual Supervision (MuSu)
MuSu defines training samples for the regression head mainly based on classification predicting scores and in turn, defines samples for the classification head based on localization scores from the regression head.
Experimental results show that the convergence of detectors trained by this mutual supervision is guaranteed and the effectiveness of the proposed method is verified on the challenging MS COCO benchmark.
arXiv Detail & Related papers (2021-09-13T14:04:13Z) - Detecting Errors and Estimating Accuracy on Unlabeled Data with
Self-training Ensembles [38.23896575179384]
We propose a principled and practically effective framework that simultaneously addresses the two tasks.
One instantiation reduces the estimation error for unsupervised accuracy estimation by at least 70% and improves the F1 score for error detection by at least 4.7%.
On iWildCam, one instantiation reduces the estimation error for unsupervised accuracy estimation by at least 70% and improves the F1 score for error detection by at least 4.7%.
arXiv Detail & Related papers (2021-06-29T21:32:51Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.