Related papers: Extended critical regimes of deep neural networks

Extended critical regimes of deep neural networks

URL: http://arxiv.org/abs/2203.12967v1
Date: Thu, 24 Mar 2022 10:15:50 GMT
Title: Extended critical regimes of deep neural networks
Authors: Cheng Kevin Qu and Asem Wardak and Pulin Gong
Abstract summary: We show that heavy-tailed weights enable the emergence of an extended critical regime without fine-tuning parameters. In this extended critical regime, DNNs exhibit rich and complex propagation dynamics across layers. We provide a theoretical guide for the design of efficient neural architectures.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep neural networks (DNNs) have been successfully applied to many real-world problems, but a complete understanding of their dynamical and computational principles is still lacking. Conventional theoretical frameworks for analysing DNNs often assume random networks with coupling weights obeying Gaussian statistics. However, non-Gaussian, heavy-tailed coupling is a ubiquitous phenomenon in DNNs. Here, by weaving together theories of heavy-tailed random matrices and non-equilibrium statistical physics, we develop a new type of mean field theory for DNNs which predicts that heavy-tailed weights enable the emergence of an extended critical regime without fine-tuning parameters. In this extended critical regime, DNNs exhibit rich and complex propagation dynamics across layers. We further elucidate that the extended criticality endows DNNs with profound computational advantages: balancing the contraction as well as expansion of internal neural representations and speeding up training processes, hence providing a theoretical guide for the design of efficient neural architectures.

Related papers

Slow Transition to Low-Dimensional Chaos in Heavy-Tailed Recurrent Neural Networks [4.098619171200725]
We study the activity of recurrent neural networks (RNNs) with random weights drawn from biologically plausible L'evy alpha-stable distributions.<n>We theoretically predict the gain at which the system transitions from quiescent to chaotic dynamics, and validate it through simulations.<n>Our results reveal a biologically aligned tradeoff between the robustness of dynamics near the edge of chaos and the richness of high-dimensional neural activity.
arXiv Detail & Related papers (2025-05-14T21:35:55Z)
PAC-Bayesian risk bounds for fully connected deep neural network with Gaussian priors [0.0]
We show that fully connected Bayesian neural networks can achieve comparable convergence rates to sparse networks.<n>Our results hold for a broad class of practical activation functions that are Lipschitz continuous.
arXiv Detail & Related papers (2025-05-07T11:42:18Z)
Pruning Deep Neural Networks via a Combination of the Marchenko-Pastur Distribution and Regularization [0.18641315013048293]
Vision Transformers (ViTs) have emerged as a powerful class of models in the field of deep learning for image classification. We propose a novel Random Matrix Theory (RMT)-based method for pruning pre-trained DNNs, based on the sparsification of weights and singular vectors. We demonstrate that our RMT-based pruning can be used to reduce the number of parameters of ViT models by 30-50% with less than 1% loss in accuracy.
arXiv Detail & Related papers (2025-03-02T05:25:20Z)
Continuous Spiking Graph Neural Networks [43.28609498855841]
Continuous graph neural networks (CGNNs) have garnered significant attention due to their ability to generalize existing discrete graph neural networks (GNNs) We introduce the high-order structure of COS-GNN, which utilizes the second-order ODE for spiking representation and continuous propagation. We provide the theoretical proof that COS-GNN effectively mitigates the issues of exploding and vanishing gradients, enabling us to capture long-range dependencies between nodes.
arXiv Detail & Related papers (2024-04-02T12:36:40Z)
On the Disconnect Between Theory and Practice of Neural Networks: Limits of the NTK Perspective [9.753461673117362]
The neural tangent kernel (NTK) has garnered significant attention as a theoretical framework for describing the behavior of large-scale neural networks. Current results quantifying the rate of convergence to the kernel regime suggest that exploiting these benefits requires architectures that are orders of magnitude wider than they are deep. This paper investigates whether the limiting regime predicts practically relevant behavior of large-width architectures.
arXiv Detail & Related papers (2023-09-29T20:51:24Z)
Learning Ability of Interpolating Deep Convolutional Neural Networks [28.437011792990347]
We study the learning ability of an important family of deep neural networks, deep convolutional neural networks (DCNNs) We show that by adding well-defined layers to a non-interpolating DCNN, we can obtain some interpolating DCNNs that maintain the good learning rates of the non-interpolating DCNN. Our work provides theoretical verification of how overfitted DCNNs generalize well.
arXiv Detail & Related papers (2022-10-25T17:22:31Z)
Learning Low Dimensional State Spaces with Overparameterized Recurrent Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory. Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z)
Extrapolation and Spectral Bias of Neural Nets with Hadamard Product: a Polynomial Net Study [55.12108376616355]
The study on NTK has been devoted to typical neural network architectures, but is incomplete for neural networks with Hadamard products (NNs-Hp) In this work, we derive the finite-width-K formulation for a special class of NNs-Hp, i.e., neural networks. We prove their equivalence to the kernel regression predictor with the associated NTK, which expands the application scope of NTK.
arXiv Detail & Related papers (2022-09-16T06:36:06Z)
On the Intrinsic Structures of Spiking Neural Networks [66.57589494713515]
Recent years have emerged a surge of interest in SNNs owing to their remarkable potential to handle time-dependent and event-driven data. There has been a dearth of comprehensive studies examining the impact of intrinsic structures within spiking computations. This work delves deep into the intrinsic structures of SNNs, by elucidating their influence on the expressivity of SNNs.
arXiv Detail & Related papers (2022-06-21T09:42:30Z)
Linear Leaky-Integrate-and-Fire Neuron Model Based Spiking Neural Networks and Its Mapping Relationship to Deep Neural Networks [7.840247953745616]
Spiking neural networks (SNNs) are brain-inspired machine learning algorithms with merits such as biological plausibility and unsupervised learning capability. This paper establishes a precise mathematical mapping between the biological parameters of the Linear Leaky-Integrate-and-Fire model (LIF)/SNNs and the parameters of ReLU-AN/Deep Neural Networks (DNNs)
arXiv Detail & Related papers (2022-05-31T17:02:26Z)
Knowledge Enhanced Neural Networks for relational domains [83.9217787335878]
We focus on a specific method, KENN, a Neural-Symbolic architecture that injects prior logical knowledge into a neural network. In this paper, we propose an extension of KENN for relational data.
arXiv Detail & Related papers (2022-05-31T13:00:34Z)
Comparative Analysis of Interval Reachability for Robust Implicit and Feedforward Neural Networks [64.23331120621118]
We use interval reachability analysis to obtain robustness guarantees for implicit neural networks (INNs) INNs are a class of implicit learning models that use implicit equations as layers. We show that our approach performs at least as well as, and generally better than, applying state-of-the-art interval bound propagation methods to INNs.
arXiv Detail & Related papers (2022-04-01T03:31:27Z)
A self consistent theory of Gaussian Processes captures feature learning effects in finite CNNs [2.28438857884398]
Deep neural networks (DNNs) in the infinite width/channel limit have received much attention recently. Despite their theoretical appeal, this viewpoint lacks a crucial ingredient of deep learning in finite DNNs, laying at the heart of their success -- feature learning. Here we consider DNNs trained with noisy gradient descent on a large training set and derive a self consistent Gaussian Process theory accounting for strong finite-DNN and feature learning effects.
arXiv Detail & Related papers (2021-06-08T05:20:00Z)
Generalization bound of globally optimal non-convex neural network training: Transportation map estimation by infinite dimensional Langevin dynamics [50.83356836818667]
We introduce a new theoretical framework to analyze deep learning optimization with connection to its generalization error. Existing frameworks such as mean field theory and neural tangent kernel theory for neural network optimization analysis typically require taking limit of infinite width of the network to show its global convergence.
arXiv Detail & Related papers (2020-07-11T18:19:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.