Can Training Dynamics of Scale-Invariant Neural Networks Be Explained by the Thermodynamics of an Ideal Gas?
- URL: http://arxiv.org/abs/2511.07308v1
- Date: Mon, 10 Nov 2025 17:10:01 GMT
- Title: Can Training Dynamics of Scale-Invariant Neural Networks Be Explained by the Thermodynamics of an Ideal Gas?
- Authors: Ildus Sadrtdinov, Ekaterina Lobacheva, Ivan Klimov, Mikhail I. Katsnelson, Dmitry Vetrov,
- Abstract summary: We develop a framework to describe the stationary distributions of gradient descent (SGD) with weight decay for scale-invariant neural networks.<n>We show that key predictions of the framework, including the behavior of stationary entropy, align closely with experimental observations.
- Score: 4.565724079570854
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding the training dynamics of deep neural networks remains a major open problem, with physics-inspired approaches offering promising insights. Building on this perspective, we develop a thermodynamic framework to describe the stationary distributions of stochastic gradient descent (SGD) with weight decay for scale-invariant neural networks, a setting that both reflects practical architectures with normalization layers and permits theoretical analysis. We establish analogies between training hyperparameters (e.g., learning rate, weight decay) and thermodynamic variables such as temperature, pressure, and volume. Starting with a simplified isotropic noise model, we uncover a close correspondence between SGD dynamics and ideal gas behavior, validated through theory and simulation. Extending to training of neural networks, we show that key predictions of the framework, including the behavior of stationary entropy, align closely with experimental observations. This framework provides a principled foundation for interpreting training dynamics and may guide future work on hyperparameter tuning and the design of learning rate schedulers.
Related papers
- On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models [54.61810451777578]
Entropy serves as a critical metric for measuring the diversity of outputs generated by large language models.<n>Recent studies increasingly focus on monitoring and adjusting entropy to better balance exploration and exploitation in reinforcement fine-tuning.
arXiv Detail & Related papers (2026-02-03T11:14:58Z) - Langevin Flows for Modeling Neural Latent Dynamics [81.81271685018284]
We introduce LangevinFlow, a sequential Variational Auto-Encoder where the time evolution of latent variables is governed by the underdamped Langevin equation.<n>Our approach incorporates physical priors -- such as inertia, damping, a learned potential function, and forces -- to represent both autonomous and non-autonomous processes in neural systems.<n>Our method outperforms state-of-the-art baselines on synthetic neural populations generated by a Lorenz attractor.
arXiv Detail & Related papers (2025-07-15T17:57:48Z) - High-order expansion of Neural Ordinary Differential Equations flows [4.4569182855550755]
We introduce Event Transitions, a framework based on high-order differentials that provides a rigorous mathematical description of neural ODE dynamics on event gradient.<n>Our findings contribute to a deeper theoretical foundation for event-triggered neural differential equations and provide a mathematical construct for explaining complex system dynamics.
arXiv Detail & Related papers (2025-04-02T08:57:34Z) - Allostatic Control of Persistent States in Spiking Neural Networks for perception and computation [79.16635054977068]
We introduce a novel model for updating perceptual beliefs about the environment by extending the concept of Allostasis to the control of internal representations.<n>In this paper, we focus on an application in numerical cognition, where a bump of activity in an attractor network is used as a spatial numerical representation.
arXiv Detail & Related papers (2025-03-20T12:28:08Z) - Latent Space Energy-based Neural ODEs [73.01344439786524]
This paper introduces novel deep dynamical models designed to represent continuous-time sequences.<n>We train the model using maximum likelihood estimation with Markov chain Monte Carlo.<n> Experimental results on oscillating systems, videos and real-world state sequences (MuJoCo) demonstrate that our model with the learnable energy-based prior outperforms existing counterparts.
arXiv Detail & Related papers (2024-09-05T18:14:22Z) - Dynamical stability and chaos in artificial neural network trajectories along training [3.379574469735166]
We study the dynamical properties of this process by analyzing through this lens the network trajectories of a shallow neural network.
We find hints of regular and chaotic behavior depending on the learning rate regime.
This work also contributes to the cross-fertilization of ideas between dynamical systems theory, network theory and machine learning.
arXiv Detail & Related papers (2024-04-08T17:33:11Z) - Gibbs-Duhem-Informed Neural Networks for Binary Activity Coefficient
Prediction [45.84205238554709]
We propose Gibbs-Duhem-informed neural networks for the prediction of binary activity coefficients at varying compositions.
We include the Gibbs-Duhem equation explicitly in the loss function for training neural networks.
arXiv Detail & Related papers (2023-05-31T07:36:45Z) - Constraining Chaos: Enforcing dynamical invariants in the training of
recurrent neural networks [0.0]
We introduce a novel training method for machine learning based forecasting methods for chaotic dynamical systems.
The training enforces dynamical invariants--such as the Lyapunov exponent spectrum and fractal dimension--in the systems of interest, enabling longer and more stable forecasts when operating with limited data.
arXiv Detail & Related papers (2023-04-24T00:33:47Z) - Identifying Equivalent Training Dynamics [3.793387630509845]
We develop a framework for identifying conjugate and non-conjugate training dynamics.
By leveraging advances in Koopman operator theory, we demonstrate that comparing Koopman eigenvalues can correctly identify a known equivalence between online mirror descent and online gradient descent.
We then utilize our approach to: (a) identify non-conjugate training dynamics between shallow and wide fully connected neural networks; (b) characterize the early phase of training dynamics in convolutional neural networks; (c) uncover non-conjugate training dynamics in Transformers that do and do not undergo grokking.
arXiv Detail & Related papers (2023-02-17T22:15:20Z) - ConCerNet: A Contrastive Learning Based Framework for Automated
Conservation Law Discovery and Trustworthy Dynamical System Prediction [82.81767856234956]
This paper proposes a new learning framework named ConCerNet to improve the trustworthiness of the DNN based dynamics modeling.
We show that our method consistently outperforms the baseline neural networks in both coordinate error and conservation metrics.
arXiv Detail & Related papers (2023-02-11T21:07:30Z) - Physics-Inspired Temporal Learning of Quadrotor Dynamics for Accurate
Model Predictive Trajectory Tracking [76.27433308688592]
Accurately modeling quadrotor's system dynamics is critical for guaranteeing agile, safe, and stable navigation.
We present a novel Physics-Inspired Temporal Convolutional Network (PI-TCN) approach to learning quadrotor's system dynamics purely from robot experience.
Our approach combines the expressive power of sparse temporal convolutions and dense feed-forward connections to make accurate system predictions.
arXiv Detail & Related papers (2022-06-07T13:51:35Z) - Physics-guided Deep Markov Models for Learning Nonlinear Dynamical
Systems with Uncertainty [6.151348127802708]
We propose a physics-guided framework, termed Physics-guided Deep Markov Model (PgDMM)
The proposed framework takes advantage of the expressive power of deep learning, while retaining the driving physics of the dynamical system.
arXiv Detail & Related papers (2021-10-16T16:35:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.