Inconsistency, Instability, and Generalization Gap of Deep Neural
Network Training
- URL: http://arxiv.org/abs/2306.00169v2
- Date: Sun, 29 Oct 2023 13:04:11 GMT
- Title: Inconsistency, Instability, and Generalization Gap of Deep Neural
Network Training
- Authors: Rie Johnson and Tong Zhang
- Abstract summary: We show that inconsistency is a more reliable indicator of generalization gap than the sharpness of the loss landscape.
The results also provide a theoretical basis for existing methods such as co-distillation and ensemble.
- Score: 14.871738070617491
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As deep neural networks are highly expressive, it is important to find
solutions with small generalization gap (the difference between the performance
on the training data and unseen data). Focusing on the stochastic nature of
training, we first present a theoretical analysis in which the bound of
generalization gap depends on what we call inconsistency and instability of
model outputs, which can be estimated on unlabeled data. Our empirical study
based on this analysis shows that instability and inconsistency are strongly
predictive of generalization gap in various settings. In particular, our
finding indicates that inconsistency is a more reliable indicator of
generalization gap than the sharpness of the loss landscape. Furthermore, we
show that algorithmic reduction of inconsistency leads to superior performance.
The results also provide a theoretical basis for existing methods such as
co-distillation and ensemble.
Related papers
- On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - Learning Linear Causal Representations from Interventions under General
Nonlinear Mixing [52.66151568785088]
We prove strong identifiability results given unknown single-node interventions without access to the intervention targets.
This is the first instance of causal identifiability from non-paired interventions for deep neural network embeddings.
arXiv Detail & Related papers (2023-06-04T02:32:12Z) - On the Importance of Feature Separability in Predicting
Out-Of-Distribution Error [25.995311155942016]
We propose a dataset-level score based upon feature dispersion to estimate the test accuracy under distribution shift.
Our method is inspired by desirable properties of features in representation learning: high inter-class dispersion and high intra-class compactness.
arXiv Detail & Related papers (2023-03-27T09:52:59Z) - Using Focal Loss to Fight Shallow Heuristics: An Empirical Analysis of
Modulated Cross-Entropy in Natural Language Inference [0.0]
In some datasets, deep neural networks discover underlyings that allow them to take shortcuts in the learning process, resulting in poor generalization capability.
Instead of using standard cross-entropy, we explore whether a modulated version of cross-entropy called focal loss can constrain the model so as not to use underlyings and improve generalization performance.
Our experiments in natural language inference show that focal loss has a regularizing impact on the learning process, increasing accuracy on out-of-distribution data, but slightly decreasing performance on in-distribution data.
arXiv Detail & Related papers (2022-11-23T22:19:00Z) - On the generalization of learning algorithms that do not converge [54.122745736433856]
Generalization analyses of deep learning typically assume that the training converges to a fixed point.
Recent results indicate that in practice, the weights of deep neural networks optimized with gradient descent often oscillate indefinitely.
arXiv Detail & Related papers (2022-08-16T21:22:34Z) - Distribution of Classification Margins: Are All Data Equal? [61.16681488656473]
We motivate theoretically and show empirically that the area under the curve of the margin distribution on the training set is in fact a good measure of generalization.
The resulting subset of "high capacity" features is not consistent across different training runs.
arXiv Detail & Related papers (2021-07-21T16:41:57Z) - Accounting for Unobserved Confounding in Domain Generalization [107.0464488046289]
This paper investigates the problem of learning robust, generalizable prediction models from a combination of datasets.
Part of the challenge of learning robust models lies in the influence of unobserved confounders.
We demonstrate the empirical performance of our approach on healthcare data from different modalities.
arXiv Detail & Related papers (2020-07-21T08:18:06Z) - Optimization and Generalization of Regularization-Based Continual
Learning: a Loss Approximation Viewpoint [35.5156045701898]
We provide a novel viewpoint of regularization-based continual learning by formulating it as a second-order Taylor approximation of the loss function of each task.
Based on this viewpoint, we study the optimization aspects (i.e., convergence) as well as generalization properties (i.e., finite-sample guarantees) of regularization-based continual learning.
arXiv Detail & Related papers (2020-06-19T06:08:40Z) - On the Benefits of Invariance in Neural Networks [56.362579457990094]
We show that training with data augmentation leads to better estimates of risk and thereof gradients, and we provide a PAC-Bayes generalization bound for models trained with data augmentation.
We also show that compared to data augmentation, feature averaging reduces generalization error when used with convex losses, and tightens PAC-Bayes bounds.
arXiv Detail & Related papers (2020-05-01T02:08:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.