Parameter Symmetry Breaking and Restoration Determines the Hierarchical Learning in AI Systems
- URL: http://arxiv.org/abs/2502.05300v1
- Date: Fri, 07 Feb 2025 20:10:05 GMT
- Title: Parameter Symmetry Breaking and Restoration Determines the Hierarchical Learning in AI Systems
- Authors: Liu Ziyin, Yizhou Xu, Tomaso Poggio, Isaac Chuang,
- Abstract summary: The dynamics of learning in modern large AI systems is hierarchical, often characterized by abrupt, qualitative shifts.<n>We show that parameter symmetry breaking and restoration serve as a unifying mechanism underlying these behaviors.<n>By connecting these hierarchies, we highlight symmetry as a potential fundamental principle in modern AI.
- Score: 2.0383173745487198
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The dynamics of learning in modern large AI systems is hierarchical, often characterized by abrupt, qualitative shifts akin to phase transitions observed in physical systems. While these phenomena hold promise for uncovering the mechanisms behind neural networks and language models, existing theories remain fragmented, addressing specific cases. In this paper, we posit that parameter symmetry breaking and restoration serve as a unifying mechanism underlying these behaviors. We synthesize prior observations and show how this mechanism explains three distinct hierarchies in neural networks: learning dynamics, model complexity, and representation formation. By connecting these hierarchies, we highlight symmetry -- a cornerstone of theoretical physics -- as a potential fundamental principle in modern AI.
Related papers
- Dynamical symmetries in the fluctuation-driven regime: an application of Noether's theorem to noisy dynamical systems [0.0]
Nonequilibrium physics provides a variational principle that describes how fairly generic noisy dynamical systems are most likely to transition between two states.
We identify analogues of the conservation of energy, momentum, and angular momentum, and briefly discuss examples of each in the context of models of decision-making, recurrent neural networks, and diffusion generative models.
arXiv Detail & Related papers (2025-04-13T23:56:31Z) - Transformer Dynamics: A neuroscientific approach to interpretability of large language models [0.0]
We focus on the residual stream (RS) in transformer models, conceptualizing it as a dynamical system evolving across layers.
We find that activations of individual RS units exhibit strong continuity across layers, despite the RS being a non-privileged basis.
In reduced-dimensional spaces, the RS follows a curved trajectory with attractor-like dynamics in the lower layers.
arXiv Detail & Related papers (2025-02-17T18:49:40Z) - Artificial Kuramoto Oscillatory Neurons [65.16453738828672]
We introduce Artificial Kuramotoy Neurons (AKOrN) as a dynamical alternative to threshold units.
We show that this idea provides performance improvements across a wide spectrum of tasks.
We believe that these empirical results show the importance of our assumptions at the most basic neuronal level of neural representation.
arXiv Detail & Related papers (2024-10-17T17:47:54Z) - A spring-block theory of feature learning in deep neural networks [11.396919965037636]
Feature-learning deep nets progressively collapse data to a regular low-dimensional geometry.
We show how this phenomenon emerges from collective action of nonlinearity, noise, learning rate, and other choices that shape the dynamics.
We propose a macroscopic mechanical theory that reproduces the diagram, explaining why some DNNs are lazy and some active, and linking feature learning across layers to generalization.
arXiv Detail & Related papers (2024-07-28T00:07:20Z) - Learning Discrete Concepts in Latent Hierarchical Models [73.01229236386148]
Learning concepts from natural high-dimensional data holds potential in building human-aligned and interpretable machine learning models.<n>We formalize concepts as discrete latent causal variables that are related via a hierarchical causal model.<n>We substantiate our theoretical claims with synthetic data experiments.
arXiv Detail & Related papers (2024-06-01T18:01:03Z) - The Impact of Geometric Complexity on Neural Collapse in Transfer Learning [6.554326244334867]
Flatness of the loss surface and neural collapse have recently emerged as useful pre-training metrics.<n>We show through experiments and theory that mechanisms which affect the geometric complexity of the pre-trained network also influence the neural collapse.
arXiv Detail & Related papers (2024-05-24T16:52:09Z) - Disentangling the Causes of Plasticity Loss in Neural Networks [55.23250269007988]
We show that loss of plasticity can be decomposed into multiple independent mechanisms.
We show that a combination of layer normalization and weight decay is highly effective at maintaining plasticity in a variety of synthetic nonstationary learning tasks.
arXiv Detail & Related papers (2024-02-29T00:02:33Z) - Binding Dynamics in Rotating Features [72.80071820194273]
We propose an alternative "cosine binding" mechanism, which explicitly computes the alignment between features and adjusts weights accordingly.
This allows us to draw direct connections to self-attention and biological neural processes, and to shed light on the fundamental dynamics for object-centric representations to emerge in Rotating Features.
arXiv Detail & Related papers (2024-02-08T12:31:08Z) - Brain-Inspired Machine Intelligence: A Survey of
Neurobiologically-Plausible Credit Assignment [65.268245109828]
We examine algorithms for conducting credit assignment in artificial neural networks that are inspired or motivated by neurobiology.
We organize the ever-growing set of brain-inspired learning schemes into six general families and consider these in the context of backpropagation of errors.
The results of this review are meant to encourage future developments in neuro-mimetic systems and their constituent learning processes.
arXiv Detail & Related papers (2023-12-01T05:20:57Z) - Learning reversible symplectic dynamics [0.0]
We propose a new neural network architecture for learning time-reversible dynamical systems from data.
We focus on an adaptation to symplectic systems, because of their importance in physics-informed learning.
arXiv Detail & Related papers (2022-04-26T14:07:40Z) - Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task.
This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z) - Developing Constrained Neural Units Over Time [81.19349325749037]
This paper focuses on an alternative way of defining Neural Networks, that is different from the majority of existing approaches.
The structure of the neural architecture is defined by means of a special class of constraints that are extended also to the interaction with data.
The proposed theory is cast into the time domain, in which data are presented to the network in an ordered manner.
arXiv Detail & Related papers (2020-09-01T09:07:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.