Related papers: Understanding Why Neural Networks Generalize Well Through GSNR of Parameters

Understanding Why Neural Networks Generalize Well Through GSNR of Parameters

URL: http://arxiv.org/abs/2001.07384v2
Date: Mon, 24 Feb 2020 10:47:39 GMT
Title: Understanding Why Neural Networks Generalize Well Through GSNR of Parameters
Authors: Jinlong Liu, Guoqing Jiang, Yunzhi Bai, Ting Chen, Huayan Wang
Abstract summary: We study gradient signal to noise ratio (GSNR) of parameters during training process of deep neural networks (DNNs) We show that larger GSNR during training process leads to better generalization performance.
Score: 11.208337921488207
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: As deep neural networks (DNNs) achieve tremendous success across many application domains, researchers tried to explore in many aspects on why they generalize well. In this paper, we provide a novel perspective on these issues using the gradient signal to noise ratio (GSNR) of parameters during training process of DNNs. The GSNR of a parameter is defined as the ratio between its gradient's squared mean and variance, over the data distribution. Based on several approximations, we establish a quantitative relationship between model parameters' GSNR and the generalization gap. This relationship indicates that larger GSNR during training process leads to better generalization performance. Moreover, we show that, different from that of shallow models (e.g. logistic regression, support vector machines), the gradient descent optimization dynamics of DNNs naturally produces large GSNR during training, which is probably the key to DNNs' remarkable generalization ability.

Related papers

Generalization of Graph Neural Networks is Robust to Model Mismatch [84.01980526069075]
Graph neural networks (GNNs) have demonstrated their effectiveness in various tasks supported by their generalization capabilities. In this paper, we examine GNNs that operate on geometric graphs generated from manifold models. Our analysis reveals the robustness of the GNN generalization in the presence of such model mismatch.
arXiv Detail & Related papers (2024-08-25T16:00:44Z)
Graph Neural Reaction Diffusion Models [14.164952387868341]
We propose a novel family of Reaction GNNs based on neural RD systems. We discuss the theoretical properties of our RDGNN, its implementation, and show that it improves or offers competitive performance to state-of-the-art methods.
arXiv Detail & Related papers (2024-06-16T09:46:58Z)
Bifurcations and loss jumps in RNN training [7.937801286897863]
We introduce a novel algorithm for detecting all fixed points and k-cycles in ReLU-based RNNs and their existence and stability regions. Our algorithm provides exact results and returns fixed points and cycles up to high orders with surprisingly good scaling behavior.
arXiv Detail & Related papers (2023-10-26T16:49:44Z)
Domain Generalization Guided by Gradient Signal to Noise Ratio of Parameters [69.24377241408851]
Overfitting to the source domain is a common issue in gradient-based training of deep neural networks. We propose to base the selection on gradient-signal-to-noise ratio (GSNR) of network's parameters.
arXiv Detail & Related papers (2023-10-11T10:21:34Z)
Transformed Low-Rank Parameterization Can Help Robust Generalization for Tensor Neural Networks [32.87980654923361]
tensor Singular Value Decomposition (t-SVD) has achieved extensive success in multi-channel data representation. It still remains unclear how t-SVD theoretically affects the learning behavior of t-NNs. This paper is the first to answer this question by deriving the upper bounds of the generalization error of both standard and adversarially trained t-NNs.
arXiv Detail & Related papers (2023-03-01T03:05:40Z)
Graph Neural Networks are Inherently Good Generalizers: Insights by Bridging GNNs and MLPs [71.93227401463199]
This paper pinpoints the major source of GNNs' performance gain to their intrinsic capability, by introducing an intermediate model class dubbed as P(ropagational)MLP. We observe that PMLPs consistently perform on par with (or even exceed) their GNN counterparts, while being much more efficient in training.
arXiv Detail & Related papers (2022-12-18T08:17:32Z)
Learning Low Dimensional State Spaces with Overparameterized Recurrent Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory. Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z)
A Biased Graph Neural Network Sampler with Near-Optimal Regret [57.70126763759996]
Graph neural networks (GNN) have emerged as a vehicle for applying deep network architectures to graph and relational data. In this paper, we build upon existing work and treat GNN neighbor sampling as a multi-armed bandit problem. We introduce a newly-designed reward function that introduces some degree of bias designed to reduce variance and avoid unstable, possibly-unbounded payouts.
arXiv Detail & Related papers (2021-03-01T15:55:58Z)
Advantage of Deep Neural Networks for Estimating Functions with Singularity on Hypersurfaces [23.21591478556582]
We develop a minimax rate analysis to describe the reason that deep neural networks (DNNs) perform better than other standard methods. This study tries to fill this gap by considering the estimation for a class of non-smooth functions that have singularities on hypersurfaces.
arXiv Detail & Related papers (2020-11-04T12:51:14Z)
The Surprising Power of Graph Neural Networks with Random Node Initialization [54.4101931234922]
Graph neural networks (GNNs) are effective models for representation learning on relational data. Standard GNNs are limited in their expressive power, as they cannot distinguish beyond the capability of the Weisfeiler-Leman graph isomorphism. In this work, we analyze the expressive power of GNNs with random node (RNI) We prove that these models are universal, a first such result for GNNs not relying on computationally demanding higher-order properties.
arXiv Detail & Related papers (2020-10-02T19:53:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.