Tangent Space Sensitivity and Distribution of Linear Regions in ReLU
Networks
- URL: http://arxiv.org/abs/2006.06780v1
- Date: Thu, 11 Jun 2020 20:02:51 GMT
- Title: Tangent Space Sensitivity and Distribution of Linear Regions in ReLU
Networks
- Authors: B\'alint Dar\'oczy
- Abstract summary: We consider adversarial stability in the tangent space and suggest tangent sensitivity in order to characterize stability.
We derive several easily computable bounds and empirical measures for feed-forward fully connected ReLU networks.
Our experiments suggest that even simple bounds and measures are associated with the empirical generalization gap.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent articles indicate that deep neural networks are efficient models for
various learning problems. However they are often highly sensitive to various
changes that cannot be detected by an independent observer. As our
understanding of deep neural networks with traditional generalization bounds
still remains incomplete, there are several measures which capture the
behaviour of the model in case of small changes at a specific state. In this
paper we consider adversarial stability in the tangent space and suggest
tangent sensitivity in order to characterize stability. We focus on a
particular kind of stability with respect to changes in parameters that are
induced by individual examples without known labels. We derive several easily
computable bounds and empirical measures for feed-forward fully connected ReLU
(Rectified Linear Unit) networks and connect tangent sensitivity to the
distribution of the activation regions in the input space realized by the
network. Our experiments suggest that even simple bounds and measures are
associated with the empirical generalization gap.
Related papers
- Machine learning in and out of equilibrium [58.88325379746631]
Our study uses a Fokker-Planck approach, adapted from statistical physics, to explore these parallels.
We focus in particular on the stationary state of the system in the long-time limit, which in conventional SGD is out of equilibrium.
We propose a new variation of Langevin dynamics (SGLD) that harnesses without replacement minibatching.
arXiv Detail & Related papers (2023-06-06T09:12:49Z) - Quantifying the Variability Collapse of Neural Networks [2.9551667607781607]
Recently discovered Neural Collapse (NC) phenomenon provides a new perspective of understanding such last layer geometry of neural networks.
We propose a novel metric, named Variability Collapse Index (VCI), to quantify the variability collapse phenomenon in the NC paradigm.
arXiv Detail & Related papers (2023-06-06T06:37:07Z) - On the Lipschitz Constant of Deep Networks and Double Descent [5.381801249240512]
Existing bounds on the generalization error of deep networks assume some form of smooth or bounded dependence on the input variable.
We present an extensive experimental study of the empirical Lipschitz constant of deep networks undergoing double descent.
arXiv Detail & Related papers (2023-01-28T23:22:49Z) - Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks.
We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space.
We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z) - Learning Low Dimensional State Spaces with Overparameterized Recurrent
Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory.
Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z) - Asymptotic-Preserving Neural Networks for multiscale hyperbolic models
of epidemic spread [0.0]
In many circumstances, the spatial propagation of an infectious disease is characterized by movements of individuals at different scales governed by multiscale PDEs.
In presence of multiple scales, a direct application of PINNs generally leads to poor results due to the multiscale nature of the differential model in the loss function of the neural network.
We consider a new class of AP Neural Networks (APNNs) for multiscale hyperbolic transport models of epidemic spread.
arXiv Detail & Related papers (2022-06-25T11:25:47Z) - Discretization Invariant Networks for Learning Maps between Neural
Fields [3.09125960098955]
We present a new framework for understanding and designing discretization invariant neural networks (DI-Nets)
Our analysis establishes upper bounds on the deviation in model outputs under different finite discretizations.
We prove by construction that DI-Nets universally approximate a large class of maps between integrable function spaces.
arXiv Detail & Related papers (2022-06-02T17:44:03Z) - The Sample Complexity of One-Hidden-Layer Neural Networks [57.6421258363243]
We study a class of scalar-valued one-hidden-layer networks, and inputs bounded in Euclidean norm.
We prove that controlling the spectral norm of the hidden layer weight matrix is insufficient to get uniform convergence guarantees.
We analyze two important settings where a mere spectral norm control turns out to be sufficient.
arXiv Detail & Related papers (2022-02-13T07:12:02Z) - On generalization bounds for deep networks based on loss surface
implicit regularization [5.68558935178946]
Modern deep neural networks generalize well despite a large number of parameters.
That modern deep neural networks generalize well despite a large number of parameters contradicts the classical statistical learning theory.
arXiv Detail & Related papers (2022-01-12T16:41:34Z) - Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks [83.58049517083138]
We consider a two-layer ReLU network trained via gradient descent.
We show that SGD is biased towards a simple solution.
We also provide empirical evidence that knots at locations distinct from the data points might occur.
arXiv Detail & Related papers (2021-11-03T15:14:20Z) - Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task.
This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.