Universal Scaling Laws of Absorbing Phase Transitions in Artificial Deep Neural Networks
- URL: http://arxiv.org/abs/2307.02284v2
- Date: Tue, 29 Oct 2024 14:57:26 GMT
- Title: Universal Scaling Laws of Absorbing Phase Transitions in Artificial Deep Neural Networks
- Authors: Keiichi Tamai, Tsuyoshi Okubo, Truong Vinh Truong Duy, Naotake Natori, Synge Todo,
- Abstract summary: Conventional artificial deep neural networks operating near the phase boundary of the signal propagation dynamics, also known as the edge of chaos, exhibit universal scaling laws of absorbing phase transitions.
Our numerical results indicate that the multilayer perceptrons and the convolutional neural networks belong to the mean-field and the directed percolation classes, respectively.
- Score: 0.8932296777085644
- License:
- Abstract: We demonstrate that conventional artificial deep neural networks operating near the phase boundary of the signal propagation dynamics, also known as the edge of chaos, exhibit universal scaling laws of absorbing phase transitions in non-equilibrium statistical mechanics. Our numerical results indicate that the multilayer perceptrons and the convolutional neural networks belong to the mean-field and the directed percolation universality classes, respectively. Also, the finite-size scaling is successfully applied, suggesting a potential connection to the depth-width trade-off in deep learning. Furthermore, our analysis of the training dynamics under gradient descent reveals that hyperparameter tuning to the phase boundary is necessary but insufficient for achieving optimal generalization in deep networks. Remarkably, nonuniversal metric factors associated with the scaling laws are shown to play a significant role in concretizing the above observations. These findings highlight the usefulness of the notion of criticality for analyzing the behavior of artificial deep neural networks and offer new insights toward a unified understanding of an essential relationship between criticality and intelligence.
Related papers
- Explosive neural networks via higher-order interactions in curved statistical manifolds [43.496401697112695]
We introduce curved neural networks as a class of prototypical models for studying higher-order phenomena.
We show that these curved neural networks implement a self-regulating process that can accelerate memory retrieval.
arXiv Detail & Related papers (2024-08-05T09:10:29Z) - Addressing caveats of neural persistence with deep graph persistence [54.424983583720675]
We find that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence.
We propose an extension of the filtration underlying neural persistence to the whole neural network instead of single layers.
This yields our deep graph persistence measure, which implicitly incorporates persistent paths through the network and alleviates variance-related issues.
arXiv Detail & Related papers (2023-07-20T13:34:11Z) - Statistical Physics of Deep Neural Networks: Initialization toward
Optimal Channels [6.144858413112823]
In deep learning, neural networks serve as noisy channels between input data and its representation.
We study a frequently overlooked possibility that neural networks can be intrinsic toward optimal channels.
arXiv Detail & Related papers (2022-12-04T05:13:01Z) - Meta-Principled Family of Hyperparameter Scaling Strategies [9.89901717499058]
We calculate the scalings of dynamical observables -- network outputs, neural tangent kernels, and differentials of neural tangent kernels -- for wide and deep neural networks.
We observe that various infinite-width limits examined in the literature correspond to the distinct corners of the interconnected web.
arXiv Detail & Related papers (2022-10-10T18:00:01Z) - Rank Diminishing in Deep Neural Networks [71.03777954670323]
Rank of neural networks measures information flowing across layers.
It is an instance of a key structural condition that applies across broad domains of machine learning.
For neural networks, however, the intrinsic mechanism that yields low-rank structures remains vague and unclear.
arXiv Detail & Related papers (2022-06-13T12:03:32Z) - A Convergence Theory Towards Practical Over-parameterized Deep Neural
Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time.
We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both.
Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z) - Generalization bound of globally optimal non-convex neural network
training: Transportation map estimation by infinite dimensional Langevin
dynamics [50.83356836818667]
We introduce a new theoretical framework to analyze deep learning optimization with connection to its generalization error.
Existing frameworks such as mean field theory and neural tangent kernel theory for neural network optimization analysis typically require taking limit of infinite width of the network to show its global convergence.
arXiv Detail & Related papers (2020-07-11T18:19:50Z) - Understanding Generalization in Deep Learning via Tensor Methods [53.808840694241]
We advance the understanding of the relations between the network's architecture and its generalizability from the compression perspective.
We propose a series of intuitive, data-dependent and easily-measurable properties that tightly characterize the compressibility and generalizability of neural networks.
arXiv Detail & Related papers (2020-01-14T22:26:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.