Jaynes Machine: The universal microstructure of deep neural networks
- URL: http://arxiv.org/abs/2310.06960v1
- Date: Tue, 10 Oct 2023 19:22:01 GMT
- Title: Jaynes Machine: The universal microstructure of deep neural networks
- Authors: Venkat Venkatasubramanian, N. Sanjeevrajan, Manasi Khandekar
- Abstract summary: We predict that all highly connected layers of deep neural networks have a universal microstructure of connection strengths that is distributed lognormally ($LN(mu, sigma)$).
Under ideal conditions, the theory predicts that $mu$ and $sigma$ are the same for all layers in all networks.
- Score: 0.9086679566009702
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We present a novel theory of the microstructure of deep neural networks.
Using a theoretical framework called statistical teleodynamics, which is a
conceptual synthesis of statistical thermodynamics and potential game theory,
we predict that all highly connected layers of deep neural networks have a
universal microstructure of connection strengths that is distributed
lognormally ($LN({\mu}, {\sigma})$). Furthermore, under ideal conditions, the
theory predicts that ${\mu}$ and ${\sigma}$ are the same for all layers in all
networks. This is shown to be the result of an arbitrage equilibrium where all
connections compete and contribute the same effective utility towards the
minimization of the overall loss function. These surprising predictions are
shown to be supported by empirical data from six large-scale deep neural
networks in real life. We also discuss how these results can be exploited to
reduce the amount of data, time, and computational resources needed to train
large deep neural networks.
Related papers
- "Lossless" Compression of Deep Neural Networks: A High-dimensional
Neural Tangent Kernel Approach [49.744093838327615]
We provide a novel compression approach to wide and fully-connected emphdeep neural nets.
Experiments on both synthetic and real-world data are conducted to support the advantages of the proposed compression scheme.
arXiv Detail & Related papers (2024-03-01T03:46:28Z) - Addressing caveats of neural persistence with deep graph persistence [54.424983583720675]
We find that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence.
We propose an extension of the filtration underlying neural persistence to the whole neural network instead of single layers.
This yields our deep graph persistence measure, which implicitly incorporates persistent paths through the network and alleviates variance-related issues.
arXiv Detail & Related papers (2023-07-20T13:34:11Z) - Generalization and Estimation Error Bounds for Model-based Neural
Networks [78.88759757988761]
We show that the generalization abilities of model-based networks for sparse recovery outperform those of regular ReLU networks.
We derive practical design rules that allow to construct model-based networks with guaranteed high generalization.
arXiv Detail & Related papers (2023-04-19T16:39:44Z) - What Makes Data Suitable for a Locally Connected Neural Network? A
Necessary and Sufficient Condition Based on Quantum Entanglement [12.143300311536201]
We show that a certain locally connected neural network is capable of accurate prediction over a data distribution if and only if the data distribution admits low quantum entanglement.
We derive a preprocessing method for enhancing the suitability of a data distribution to locally connected neural networks.
arXiv Detail & Related papers (2023-03-20T16:34:39Z) - Rank Diminishing in Deep Neural Networks [71.03777954670323]
Rank of neural networks measures information flowing across layers.
It is an instance of a key structural condition that applies across broad domains of machine learning.
For neural networks, however, the intrinsic mechanism that yields low-rank structures remains vague and unclear.
arXiv Detail & Related papers (2022-06-13T12:03:32Z) - Why Quantization Improves Generalization: NTK of Binary Weight Neural
Networks [33.08636537654596]
We take the binary weights in a neural network as random variables under rounding, and study the distribution propagation over different layers in the neural network.
We propose a quasi neural network to approximate the distribution propagation, which is a neural network with continuous parameters and smooth activation function.
arXiv Detail & Related papers (2022-06-13T06:11:21Z) - Statistical Guarantees for Approximate Stationary Points of Simple
Neural Networks [4.254099382808598]
We develop statistical guarantees for simple neural networks that coincide up to logarithmic factors with the global optima.
We make a step forward in describing the practical properties of neural networks in mathematical terms.
arXiv Detail & Related papers (2022-05-09T18:09:04Z) - Stochastic Neural Networks with Infinite Width are Deterministic [7.07065078444922]
We study neural networks, a main type of neural network in use.
We prove that as the width of an optimized neural network tends to infinity, its predictive variance on the training set decreases to zero.
arXiv Detail & Related papers (2022-01-30T04:52:31Z) - Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity
on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function.
We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - Perceptron Theory Can Predict the Accuracy of Neural Networks [6.136302173351179]
Multilayer neural networks set the current state of the art for many technical classification problems.
But, these networks are still, essentially, black boxes in terms of analyzing them and predicting their performance.
Here, we develop a statistical theory for the one-layer perceptron and show that it can predict performances of a surprisingly large variety of neural networks.
arXiv Detail & Related papers (2020-12-14T19:02:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.