Neural Thermodynamics I: Entropic Forces in Deep and Universal Representation Learning
- URL: http://arxiv.org/abs/2505.12387v1
- Date: Sun, 18 May 2025 12:25:42 GMT
- Title: Neural Thermodynamics I: Entropic Forces in Deep and Universal Representation Learning
- Authors: Liu Ziyin, Yizhou Xu, Isaac Chuang,
- Abstract summary: We propose a rigorous entropic-force theory for understanding the learning dynamics of neural networks trained with gradient descent.<n>We show that representation learning is crucially governed by emergent entropic forces arising from symmetryity and discrete-time updates.
- Score: 0.30723404270319693
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the rapid discovery of emergent phenomena in deep learning and large language models, explaining and understanding their cause has become an urgent need. Here, we propose a rigorous entropic-force theory for understanding the learning dynamics of neural networks trained with stochastic gradient descent (SGD) and its variants. Building on the theory of parameter symmetries and an entropic loss landscape, we show that representation learning is crucially governed by emergent entropic forces arising from stochasticity and discrete-time updates. These forces systematically break continuous parameter symmetries and preserve discrete ones, leading to a series of gradient balance phenomena that resemble the equipartition property of thermal systems. These phenomena, in turn, (a) explain the universal alignment of neural representations between AI models and lead to a proof of the Platonic Representation Hypothesis, and (b) reconcile the seemingly contradictory observations of sharpness- and flatness-seeking behavior of deep learning optimization. Our theory and experiments demonstrate that a combination of entropic forces and symmetry breaking is key to understanding emergent phenomena in deep learning.
Related papers
- Why Neural Network Can Discover Symbolic Structures with Gradient-based Training: An Algebraic and Geometric Foundation for Neurosymbolic Reasoning [73.18052192964349]
We develop a theoretical framework that explains how discrete symbolic structures can emerge naturally from continuous neural network training dynamics.<n>By lifting neural parameters to a measure space and modeling training as Wasserstein gradient flow, we show that under geometric constraints, the parameter measure $mu_t$ undergoes two concurrent phenomena.
arXiv Detail & Related papers (2025-06-26T22:40:30Z) - Models of Heavy-Tailed Mechanistic Universality [62.107333654304014]
We propose a family of random matrix models to explore attributes that give rise to heavy-tailed behavior in trained neural networks.<n>Under this model, spectral densities with power laws on tails arise through a combination of three independent factors.<n> Implications of our model on other appearances of heavy tails, including neural scaling laws, trajectories, and the five-plus-one phases of neural network training, are discussed.
arXiv Detail & Related papers (2025-06-04T00:55:01Z) - On the generic increase of entropy in isolated systems [2.1101683446471227]
This study establishes a universal mechanism for entropy production in isolated quantum systems governed by the eigenstate thermalization hypothesis (ETH)<n>By developing a resolvent-based framework, we demonstrate that steady-state entropy generically arises from many-body interactions, independent of specific coupling details.
arXiv Detail & Related papers (2025-05-29T03:28:27Z) - Observable-manifested correlations in many-body quantum chaotic systems [5.009081786741903]
We find that for realistic systems, the envelope function of off-diagonal elements of observables exhibits an exponential decay at large $Delta E$, while for randomized models, it tends to be flat.<n>We demonstrate that the correlations of chaotic eigenstates, originating from the delicate structures of Hamiltonians, play a crucial role in the non-trivial structure of the envelope function.
arXiv Detail & Related papers (2025-02-24T06:33:22Z) - Parameter Symmetry Breaking and Restoration Determines the Hierarchical Learning in AI Systems [2.0383173745487198]
The dynamics of learning in modern large AI systems is hierarchical, often characterized by abrupt, qualitative shifts.<n>We show that parameter symmetry breaking and restoration serve as a unifying mechanism underlying these behaviors.<n>By connecting these hierarchies, we highlight symmetry as a potential fundamental principle in modern AI.
arXiv Detail & Related papers (2025-02-07T20:10:05Z) - Pioneer: Physics-informed Riemannian Graph ODE for Entropy-increasing Dynamics [61.70424540412608]
We present a physics-informed graph ODE for a wide range of entropy-increasing dynamic systems.<n>We report the provable entropy non-decreasing of our formulation, obeying the physics laws.<n> Empirical results show the superiority of Pioneer on real datasets.
arXiv Detail & Related papers (2025-02-05T14:54:30Z) - Artificial Kuramoto Oscillatory Neurons [65.16453738828672]
It has long been known in both neuroscience and AI that ''binding'' between neurons leads to a form of competitive learning where representations are compressed in order to represent more abstract concepts in deeper layers of the network.<n>We introduce Artificial rethinking together with arbitrary connectivity designs such as fully connected convolutional, or attentive mechanisms.<n>We show that this idea provides performance improvements across a wide spectrum of tasks such as unsupervised object discovery, adversarial robustness, uncertainty, quantification, and reasoning.
arXiv Detail & Related papers (2024-10-17T17:47:54Z) - Learning Discrete Concepts in Latent Hierarchical Models [73.01229236386148]
Learning concepts from natural high-dimensional data holds potential in building human-aligned and interpretable machine learning models.<n>We formalize concepts as discrete latent causal variables that are related via a hierarchical causal model.<n>We substantiate our theoretical claims with synthetic data experiments.
arXiv Detail & Related papers (2024-06-01T18:01:03Z) - Parameter Symmetry and Noise Equilibrium of Stochastic Gradient Descent [8.347295051171525]
We show that gradient noise creates a systematic interplay of parameters $theta$ along the degenerate direction to a unique-independent fixed point $theta*$.
These points are referred to as the it noise equilibria because, at these points, noise contributions from different directions are balanced and aligned.
We show that the balance and alignment of gradient noise can serve as a novel alternative mechanism for explaining important phenomena such as progressive sharpening/flattening and representation formation within neural networks.
arXiv Detail & Related papers (2024-02-11T13:00:04Z) - TANGO: Time-Reversal Latent GraphODE for Multi-Agent Dynamical Systems [43.39754726042369]
We propose a simple-yet-effective self-supervised regularization term as a soft constraint that aligns the forward and backward trajectories predicted by a continuous graph neural network-based ordinary differential equation (GraphODE)
It effectively imposes time-reversal symmetry to enable more accurate model predictions across a wider range of dynamical systems under classical mechanics.
Experimental results on a variety of physical systems demonstrate the effectiveness of our proposed method.
arXiv Detail & Related papers (2023-10-10T08:52:16Z) - A duality connecting neural network and cosmological dynamics [0.0]
We show that the dynamics of neural networks trained with gradient descent and the dynamics of scalar fields in a flat, vacuum energy dominated Universe are structurally related.
This duality provides the framework for synergies between these systems, to understand and explain neural network dynamics.
arXiv Detail & Related papers (2022-02-22T19:00:01Z) - Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task.
This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z) - Why Adversarial Interaction Creates Non-Homogeneous Patterns: A
Pseudo-Reaction-Diffusion Model for Turing Instability [10.933825676518195]
We observe Turing-like patterns in a system of neurons with adversarial interaction.
We present a pseudo-reaction-diffusion model to explain the mechanism that may underlie these phenomena.
arXiv Detail & Related papers (2020-10-01T16:09:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.