Related papers: Investigating generalization capabilities of neural networks by means of loss landscapes and Hessian analysis

Investigating generalization capabilities of neural networks by means of loss landscapes and Hessian analysis

URL: http://arxiv.org/abs/2412.10146v2
Date: Wed, 05 Feb 2025 08:24:13 GMT
Title: Investigating generalization capabilities of neural networks by means of loss landscapes and Hessian analysis
Authors: Nikita Gabdullin,
Abstract summary: This paper studies generalization capabilities of neural networks (NNs) using new and improved PyTorch library Loss Landscape Analysis (LLA)<n>LLA facilitates visualization and analysis of loss landscapes along with the properties of NN Hessian.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper studies generalization capabilities of neural networks (NNs) using new and improved PyTorch library Loss Landscape Analysis (LLA). LLA facilitates visualization and analysis of loss landscapes along with the properties of NN Hessian. Different approaches to NN loss landscape plotting are discussed with particular focus on normalization techniques showing that conventional methods cannot always ensure correct visualization when batch normalization layers are present in NN architecture. The use of Hessian axes is shown to be able to mitigate this effect, and methods for choosing Hessian axes are proposed. In addition, spectra of Hessian eigendecomposition are studied and it is shown that typical spectra exist for a wide range of NNs. This allows to propose quantitative criteria for Hessian analysis that can be applied to evaluate NN performance and assess its generalization capabilities. Generalization experiments are conducted using ImageNet-1K pre-trained models along with several models trained as part of this study. The experiment include training models on one dataset and testing on another one to maximize experiment similarity to model performance in the Wild. It is shown that when datasets change, the changes in criteria correlate with the changes in accuracy, making the proposed criteria a computationally efficient estimate of generalization ability, which is especially useful for extremely large datasets.

Related papers

Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models. We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z)
A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime. We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z)
Subject-specific Deep Neural Networks for Count Data with High-cardinality Categorical Features [1.2289361708127877]
We propose a novel hierarchical likelihood learning framework for introducing gamma random effects into a Poisson deep neural network. The proposed method simultaneously yields maximum likelihood estimators for fixed parameters and best unbiased predictors for random effects. State-of-the-art network architectures can be easily implemented into the proposed h-likelihood framework.
arXiv Detail & Related papers (2023-10-18T01:54:48Z)
Regularization, early-stopping and dreaming: a Hopfield-like setup to address generalization and overfitting [0.0]
We look for optimal network parameters by applying a gradient descent over a regularized loss function. Within this framework, the optimal neuron-interaction matrices correspond to Hebbian kernels revised by a reiterated unlearning protocol.
arXiv Detail & Related papers (2023-08-01T15:04:30Z)
GIT: Detecting Uncertainty, Out-Of-Distribution and Adversarial Samples using Gradients and Invariance Transformations [77.34726150561087]
We propose a holistic approach for the detection of generalization errors in deep neural networks. GIT combines the usage of gradient information and invariance transformations. Our experiments demonstrate the superior performance of GIT compared to the state-of-the-art on a variety of network architectures.
arXiv Detail & Related papers (2023-07-05T22:04:38Z)
Joint Edge-Model Sparse Learning is Provably Efficient for Graph Neural Networks [89.28881869440433]
This paper provides the first theoretical characterization of joint edge-model sparse learning for graph neural networks (GNNs) It proves analytically that both sampling important nodes and pruning neurons with the lowest-magnitude can reduce the sample complexity and improve convergence without compromising the test accuracy.
arXiv Detail & Related papers (2023-02-06T16:54:20Z)
Out-of-distributional risk bounds for neural operators with applications to the Helmholtz equation [6.296104145657063]
Existing neural operators (NOs) do not necessarily perform well for all physics problems. We propose a subfamily of NOs enabling an enhanced empirical approximation of the nonlinear operator mapping wave speed to solution. Our experiments reveal certain surprises in the generalization and the relevance of introducing depth. We conclude by proposing a hypernetwork version of the subfamily of NOs as a surrogate model for the mentioned forward operator.
arXiv Detail & Related papers (2023-01-27T03:02:12Z)
Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes [3.808063547958558]
We study the properties of training scale-invariant neural networks directly on the sphere using a fixed ELR. We discover three regimes of such training depending on the ELR value: convergence, chaotic equilibrium, and divergence.
arXiv Detail & Related papers (2022-09-08T10:30:05Z)
Post-mortem on a deep learning contest: a Simpson's paradox and the complementary roles of scale metrics versus shape metrics [61.49826776409194]
We analyze a corpus of models made publicly-available for a contest to predict the generalization accuracy of neural network (NN) models. We identify what amounts to a Simpson's paradox: where "scale" metrics perform well overall but perform poorly on sub partitions of the data. We present two novel shape metrics, one data-independent, and the other data-dependent, which can predict trends in the test accuracy of a series of NNs.
arXiv Detail & Related papers (2021-06-01T19:19:49Z)
Exponentially improved detection and correction of errors in experimental systems using neural networks [0.0]
We introduce the use of two machine learning algorithms to create an empirical model of an experimental apparatus. This is able to reduce the number of measurements necessary for generic optimisation tasks exponentially. We demonstrate both algorithms at the example of detecting and compensating stray electric fields in an ion trap.
arXiv Detail & Related papers (2020-05-18T22:42:11Z)
Layer-wise Conditioning Analysis in Exploring the Learning Dynamics of DNNs [115.35745188028169]
We extend conditioning analysis to deep neural networks (DNNs) in order to investigate their learning dynamics. We show that batch normalization (BN) can stabilize the training, but sometimes result in the false impression of a local minimum. We experimentally observe that BN can improve the layer-wise conditioning of the optimization problem.
arXiv Detail & Related papers (2020-02-25T11:40:27Z)
Topologically Densified Distributions [25.140319008330167]
We study regularization in the context of small sample-size learning with over- parameterized neural networks. We impose a topological constraint on samples drawn from the probability measure induced in that space. This provably leads to mass concentration effects around the representations of training instances.
arXiv Detail & Related papers (2020-02-12T05:25:15Z)
Understanding Generalization in Deep Learning via Tensor Methods [53.808840694241]
We advance the understanding of the relations between the network's architecture and its generalizability from the compression perspective. We propose a series of intuitive, data-dependent and easily-measurable properties that tightly characterize the compressibility and generalizability of neural networks.
arXiv Detail & Related papers (2020-01-14T22:26:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.