Extended critical regimes of deep neural networks
- URL: http://arxiv.org/abs/2203.12967v1
- Date: Thu, 24 Mar 2022 10:15:50 GMT
- Title: Extended critical regimes of deep neural networks
- Authors: Cheng Kevin Qu and Asem Wardak and Pulin Gong
- Abstract summary: We show that heavy-tailed weights enable the emergence of an extended critical regime without fine-tuning parameters.
In this extended critical regime, DNNs exhibit rich and complex propagation dynamics across layers.
We provide a theoretical guide for the design of efficient neural architectures.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep neural networks (DNNs) have been successfully applied to many real-world
problems, but a complete understanding of their dynamical and computational
principles is still lacking. Conventional theoretical frameworks for analysing
DNNs often assume random networks with coupling weights obeying Gaussian
statistics. However, non-Gaussian, heavy-tailed coupling is a ubiquitous
phenomenon in DNNs. Here, by weaving together theories of heavy-tailed random
matrices and non-equilibrium statistical physics, we develop a new type of mean
field theory for DNNs which predicts that heavy-tailed weights enable the
emergence of an extended critical regime without fine-tuning parameters. In
this extended critical regime, DNNs exhibit rich and complex propagation
dynamics across layers. We further elucidate that the extended criticality
endows DNNs with profound computational advantages: balancing the contraction
as well as expansion of internal neural representations and speeding up
training processes, hence providing a theoretical guide for the design of
efficient neural architectures.
Related papers
- Continuous Spiking Graph Neural Networks [43.28609498855841]
Continuous graph neural networks (CGNNs) have garnered significant attention due to their ability to generalize existing discrete graph neural networks (GNNs)
We introduce the high-order structure of COS-GNN, which utilizes the second-order ODE for spiking representation and continuous propagation.
We provide the theoretical proof that COS-GNN effectively mitigates the issues of exploding and vanishing gradients, enabling us to capture long-range dependencies between nodes.
arXiv Detail & Related papers (2024-04-02T12:36:40Z) - On the Disconnect Between Theory and Practice of Neural Networks: Limits of the NTK Perspective [9.753461673117362]
The neural tangent kernel (NTK) has garnered significant attention as a theoretical framework for describing the behavior of large-scale neural networks.
Current results quantifying the rate of convergence to the kernel regime suggest that exploiting these benefits requires architectures that are orders of magnitude wider than they are deep.
This paper investigates whether the limiting regime predicts practically relevant behavior of large-width architectures.
arXiv Detail & Related papers (2023-09-29T20:51:24Z) - Learning Ability of Interpolating Deep Convolutional Neural Networks [28.437011792990347]
We study the learning ability of an important family of deep neural networks, deep convolutional neural networks (DCNNs)
We show that by adding well-defined layers to a non-interpolating DCNN, we can obtain some interpolating DCNNs that maintain the good learning rates of the non-interpolating DCNN.
Our work provides theoretical verification of how overfitted DCNNs generalize well.
arXiv Detail & Related papers (2022-10-25T17:22:31Z) - Learning Low Dimensional State Spaces with Overparameterized Recurrent
Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory.
Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z) - Extrapolation and Spectral Bias of Neural Nets with Hadamard Product: a
Polynomial Net Study [55.12108376616355]
The study on NTK has been devoted to typical neural network architectures, but is incomplete for neural networks with Hadamard products (NNs-Hp)
In this work, we derive the finite-width-K formulation for a special class of NNs-Hp, i.e., neural networks.
We prove their equivalence to the kernel regression predictor with the associated NTK, which expands the application scope of NTK.
arXiv Detail & Related papers (2022-09-16T06:36:06Z) - On the Intrinsic Structures of Spiking Neural Networks [66.57589494713515]
Recent years have emerged a surge of interest in SNNs owing to their remarkable potential to handle time-dependent and event-driven data.
There has been a dearth of comprehensive studies examining the impact of intrinsic structures within spiking computations.
This work delves deep into the intrinsic structures of SNNs, by elucidating their influence on the expressivity of SNNs.
arXiv Detail & Related papers (2022-06-21T09:42:30Z) - Linear Leaky-Integrate-and-Fire Neuron Model Based Spiking Neural
Networks and Its Mapping Relationship to Deep Neural Networks [7.840247953745616]
Spiking neural networks (SNNs) are brain-inspired machine learning algorithms with merits such as biological plausibility and unsupervised learning capability.
This paper establishes a precise mathematical mapping between the biological parameters of the Linear Leaky-Integrate-and-Fire model (LIF)/SNNs and the parameters of ReLU-AN/Deep Neural Networks (DNNs)
arXiv Detail & Related papers (2022-05-31T17:02:26Z) - Knowledge Enhanced Neural Networks for relational domains [83.9217787335878]
We focus on a specific method, KENN, a Neural-Symbolic architecture that injects prior logical knowledge into a neural network.
In this paper, we propose an extension of KENN for relational data.
arXiv Detail & Related papers (2022-05-31T13:00:34Z) - Comparative Analysis of Interval Reachability for Robust Implicit and
Feedforward Neural Networks [64.23331120621118]
We use interval reachability analysis to obtain robustness guarantees for implicit neural networks (INNs)
INNs are a class of implicit learning models that use implicit equations as layers.
We show that our approach performs at least as well as, and generally better than, applying state-of-the-art interval bound propagation methods to INNs.
arXiv Detail & Related papers (2022-04-01T03:31:27Z) - A self consistent theory of Gaussian Processes captures feature learning
effects in finite CNNs [2.28438857884398]
Deep neural networks (DNNs) in the infinite width/channel limit have received much attention recently.
Despite their theoretical appeal, this viewpoint lacks a crucial ingredient of deep learning in finite DNNs, laying at the heart of their success -- feature learning.
Here we consider DNNs trained with noisy gradient descent on a large training set and derive a self consistent Gaussian Process theory accounting for strong finite-DNN and feature learning effects.
arXiv Detail & Related papers (2021-06-08T05:20:00Z) - Generalization bound of globally optimal non-convex neural network
training: Transportation map estimation by infinite dimensional Langevin
dynamics [50.83356836818667]
We introduce a new theoretical framework to analyze deep learning optimization with connection to its generalization error.
Existing frameworks such as mean field theory and neural tangent kernel theory for neural network optimization analysis typically require taking limit of infinite width of the network to show its global convergence.
arXiv Detail & Related papers (2020-07-11T18:19:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.