Deep Neural Nets as Hamiltonians
- URL: http://arxiv.org/abs/2503.23982v2
- Date: Sat, 05 Apr 2025 09:41:03 GMT
- Title: Deep Neural Nets as Hamiltonians
- Authors: Mike Winer, Boris Hanin,
- Abstract summary: Much prior work in deep learning theory analyzes the distribution of network outputs at a fixed a set of inputs.<n>We view a randomly Multi-Layer Perceptron (MLP) as a Hamiltonian over its inputs.<n>For typical realizations of the network parameters, we study the properties of the energy landscape induced by this Hamiltonian.
- Score: 9.883261192383612
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural networks are complex functions of both their inputs and parameters. Much prior work in deep learning theory analyzes the distribution of network outputs at a fixed a set of inputs (e.g. a training dataset) over random initializations of the network parameters. The purpose of this article is to consider the opposite situation: we view a randomly initialized Multi-Layer Perceptron (MLP) as a Hamiltonian over its inputs. For typical realizations of the network parameters, we study the properties of the energy landscape induced by this Hamiltonian, focusing on the structure of near-global minimum in the limit of infinite width. Specifically, we use the replica trick to perform an exact analytic calculation giving the entropy (log volume of space) at a given energy. We further derive saddle point equations that describe the overlaps between inputs sampled iid from the Gibbs distribution induced by the random MLP. For linear activations we solve these saddle point equations exactly. But we also solve them numerically for a variety of depths and activation functions, including $\tanh, \sin, \text{ReLU}$, and shaped non-linearities. We find even at infinite width a rich range of behaviors. For some non-linearities, such as $\sin$, for instance, we find that the landscapes of random MLPs exhibit full replica symmetry breaking, while shallow $\tanh$ and ReLU networks or deep shaped MLPs are instead replica symmetric.
Related papers
- Fixed Points of Deep Neural Networks: Emergence, Stability, and Applications [0.0]
We present results on the formation and stability of a family of fixed points of deep neural networks (DNNs)<n>We demonstrate examples of applications of such networks in supervised, semi-supervised and unsupervised learning.
arXiv Detail & Related papers (2025-01-07T23:23:26Z) - Hamiltonian Mechanics of Feature Learning: Bottleneck Structure in Leaky ResNets [58.460298576330835]
We study Leaky ResNets, which interpolate between ResNets and Fully-Connected nets depending on an 'effective depth'
We leverage this intuition to explain the emergence of a bottleneck structure, as observed in previous work.
arXiv Detail & Related papers (2024-05-27T18:15:05Z) - Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - Differential Equation Scaling Limits of Shaped and Unshaped Neural Networks [8.716913598251386]
We find similar differential equation based characterization for two types of unshaped networks.
We derive the first order correction to the layerwise correlation.
These results together provide a connection between shaped and unshaped network architectures.
arXiv Detail & Related papers (2023-10-18T16:15:10Z) - Polynomial Width is Sufficient for Set Representation with
High-dimensional Features [69.65698500919869]
DeepSets is the most widely used neural network architecture for set representation.
We present two set-element embedding layers: (a) linear + power activation (LP) and (b) linear + exponential activations (LE)
arXiv Detail & Related papers (2023-07-08T16:00:59Z) - Bayesian Interpolation with Deep Linear Networks [92.1721532941863]
Characterizing how neural network depth, width, and dataset size jointly impact model quality is a central problem in deep learning theory.
We show that linear networks make provably optimal predictions at infinite depth.
We also show that with data-agnostic priors, Bayesian model evidence in wide linear networks is maximized at infinite depth.
arXiv Detail & Related papers (2022-12-29T20:57:46Z) - The Onset of Variance-Limited Behavior for Networks in the Lazy and Rich
Regimes [75.59720049837459]
We study the transition from infinite-width behavior to this variance limited regime as a function of sample size $P$ and network width $N$.
We find that finite-size effects can become relevant for very small datasets on the order of $P* sim sqrtN$ for regression with ReLU networks.
arXiv Detail & Related papers (2022-12-23T04:48:04Z) - Understanding Deep Neural Function Approximation in Reinforcement
Learning via $\epsilon$-Greedy Exploration [53.90873926758026]
This paper provides a theoretical study of deep neural function approximation in reinforcement learning (RL)
We focus on the value based algorithm with the $epsilon$-greedy exploration via deep (and two-layer) neural networks endowed by Besov (and Barron) function spaces.
Our analysis reformulates the temporal difference error in an $L2(mathrmdmu)$-integrable space over a certain averaged measure $mu$, and transforms it to a generalization problem under the non-iid setting.
arXiv Detail & Related papers (2022-09-15T15:42:47Z) - Geometry of the Loss Landscape in Overparameterized Neural Networks:
Symmetries and Invariances [9.390008801320024]
We show that adding one extra neuron to each is sufficient to connect all previously discrete minima into a single manifold.
We show that the number of symmetry-induced critical subspaces dominates the number of affine subspaces forming the global minima manifold.
arXiv Detail & Related papers (2021-05-25T21:19:07Z) - Deep neural network approximation of analytic functions [91.3755431537592]
entropy bound for the spaces of neural networks with piecewise linear activation functions.
We derive an oracle inequality for the expected error of the considered penalized deep neural network estimators.
arXiv Detail & Related papers (2021-04-05T18:02:04Z) - Stable Recovery of Entangled Weights: Towards Robust Identification of
Deep Neural Networks from Minimal Samples [0.0]
We introduce the so-called entangled weights, which compose weights of successive layers intertwined with suitable diagonal and invertible matrices depending on the activation functions and their shifts.
We prove that entangled weights are completely and stably approximated by an efficient and robust algorithm.
In terms of practical impact, our study shows that we can relate input-output information uniquely and stably to network parameters, providing a form of explainability.
arXiv Detail & Related papers (2021-01-18T16:31:19Z) - Affine symmetries and neural network identifiability [0.0]
We consider arbitrary nonlinearities with potentially complicated affine symmetries.
We show that the symmetries can be used to find a rich set of networks giving rise to the same function $f$.
arXiv Detail & Related papers (2020-06-21T07:09:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.