Related papers: Jaynes Machine: The universal microstructure of deep neural networks

Jaynes Machine: The universal microstructure of deep neural networks

URL: http://arxiv.org/abs/2310.06960v1
Date: Tue, 10 Oct 2023 19:22:01 GMT
Title: Jaynes Machine: The universal microstructure of deep neural networks
Authors: Venkat Venkatasubramanian, N. Sanjeevrajan, Manasi Khandekar
Abstract summary: We predict that all highly connected layers of deep neural networks have a universal microstructure of connection strengths that is distributed lognormally ($LN(mu, sigma)$). Under ideal conditions, the theory predicts that $mu$ and $sigma$ are the same for all layers in all networks.
Score: 0.9086679566009702
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We present a novel theory of the microstructure of deep neural networks. Using a theoretical framework called statistical teleodynamics, which is a conceptual synthesis of statistical thermodynamics and potential game theory, we predict that all highly connected layers of deep neural networks have a universal microstructure of connection strengths that is distributed lognormally ($LN({\mu}, {\sigma})$). Furthermore, under ideal conditions, the theory predicts that ${\mu}$ and ${\sigma}$ are the same for all layers in all networks. This is shown to be the result of an arbitrage equilibrium where all connections compete and contribute the same effective utility towards the minimization of the overall loss function. These surprising predictions are shown to be supported by empirical data from six large-scale deep neural networks in real life. We also discuss how these results can be exploited to reduce the amount of data, time, and computational resources needed to train large deep neural networks.

Related papers

High-entropy Advantage in Neural Networks' Generalizability [7.193952396909214]
One of the central challenges in modern machine learning is understanding how neural networks generalize knowledge learned from training data to unseen test data. Here we introduce the concept of Boltzmann entropy into neural networks as hypothetical molecular systems where weights and biases are atomic coordinates, and the loss function is the potential energy. By employing molecular simulation algorithms, we compute entropy landscapes as functions of both training loss and test accuracy (or test loss) on networks with up to 1 million parameters, across four distinct machine learning tasks.
arXiv Detail & Related papers (2025-03-17T13:16:25Z)
Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $μ$P Parametrization [66.03821840425539]
In this paper, we investigate the training dynamics of $L$-layer neural networks using the tensor gradient program (SGD) framework. We show that SGD enables these networks to learn linearly independent features that substantially deviate from their initial values. This rich feature space captures relevant data information and ensures that any convergent point of the training process is a global minimum.
arXiv Detail & Related papers (2025-03-12T17:33:13Z)
"Lossless" Compression of Deep Neural Networks: A High-dimensional Neural Tangent Kernel Approach [49.744093838327615]
We provide a novel compression approach to wide and fully-connected emphdeep neural nets. Experiments on both synthetic and real-world data are conducted to support the advantages of the proposed compression scheme.
arXiv Detail & Related papers (2024-03-01T03:46:28Z)
Addressing caveats of neural persistence with deep graph persistence [54.424983583720675]
We find that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence. We propose an extension of the filtration underlying neural persistence to the whole neural network instead of single layers. This yields our deep graph persistence measure, which implicitly incorporates persistent paths through the network and alleviates variance-related issues.
arXiv Detail & Related papers (2023-07-20T13:34:11Z)
Generalization and Estimation Error Bounds for Model-based Neural Networks [78.88759757988761]
We show that the generalization abilities of model-based networks for sparse recovery outperform those of regular ReLU networks. We derive practical design rules that allow to construct model-based networks with guaranteed high generalization.
arXiv Detail & Related papers (2023-04-19T16:39:44Z)
What Makes Data Suitable for a Locally Connected Neural Network? A Necessary and Sufficient Condition Based on Quantum Entanglement [12.143300311536201]
We show that a certain locally connected neural network is capable of accurate prediction over a data distribution if and only if the data distribution admits low quantum entanglement. We derive a preprocessing method for enhancing the suitability of a data distribution to locally connected neural networks.
arXiv Detail & Related papers (2023-03-20T16:34:39Z)
Rank Diminishing in Deep Neural Networks [71.03777954670323]
Rank of neural networks measures information flowing across layers. It is an instance of a key structural condition that applies across broad domains of machine learning. For neural networks, however, the intrinsic mechanism that yields low-rank structures remains vague and unclear.
arXiv Detail & Related papers (2022-06-13T12:03:32Z)
Why Quantization Improves Generalization: NTK of Binary Weight Neural Networks [33.08636537654596]
We take the binary weights in a neural network as random variables under rounding, and study the distribution propagation over different layers in the neural network. We propose a quasi neural network to approximate the distribution propagation, which is a neural network with continuous parameters and smooth activation function.
arXiv Detail & Related papers (2022-06-13T06:11:21Z)
Statistical Guarantees for Approximate Stationary Points of Simple Neural Networks [4.254099382808598]
We develop statistical guarantees for simple neural networks that coincide up to logarithmic factors with the global optima. We make a step forward in describing the practical properties of neural networks in mathematical terms.
arXiv Detail & Related papers (2022-05-09T18:09:04Z)
Stochastic Neural Networks with Infinite Width are Deterministic [7.07065078444922]
We study neural networks, a main type of neural network in use. We prove that as the width of an optimized neural network tends to infinity, its predictive variance on the training set decreases to zero.
arXiv Detail & Related papers (2022-01-30T04:52:31Z)
Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function. We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z)
The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability. We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z)
Perceptron Theory Can Predict the Accuracy of Neural Networks [6.136302173351179]
Multilayer neural networks set the current state of the art for many technical classification problems. But, these networks are still, essentially, black boxes in terms of analyzing them and predicting their performance. Here, we develop a statistical theory for the one-layer perceptron and show that it can predict performances of a surprisingly large variety of neural networks.
arXiv Detail & Related papers (2020-12-14T19:02:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.