Related papers: Parameter Symmetry Potentially Unifies Deep Learning Theory

Parameter Symmetry Potentially Unifies Deep Learning Theory

URL: http://arxiv.org/abs/2502.05300v2
Date: Fri, 23 May 2025 17:22:54 GMT
Title: Parameter Symmetry Potentially Unifies Deep Learning Theory
Authors: Liu Ziyin, Yizhou Xu, Tomaso Poggio, Isaac Chuang,
Abstract summary: We advocate for the role of the research direction of parameter symmetries in unifying AI theories.<n>We argue that this direction of research could lead to a unified understanding of three distinct hierarchies in neural networks.
Score: 2.0383173745487198
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The dynamics of learning in modern large AI systems is hierarchical, often characterized by abrupt, qualitative shifts akin to phase transitions observed in physical systems. While these phenomena hold promise for uncovering the mechanisms behind neural networks and language models, existing theories remain fragmented, addressing specific cases. In this position paper, we advocate for the crucial role of the research direction of parameter symmetries in unifying these fragmented theories. This position is founded on a centralizing hypothesis for this direction: parameter symmetry breaking and restoration are the unifying mechanisms underlying the hierarchical learning behavior of AI models. We synthesize prior observations and theories to argue that this direction of research could lead to a unified understanding of three distinct hierarchies in neural networks: learning dynamics, model complexity, and representation formation. By connecting these hierarchies, our position paper elevates symmetry -- a cornerstone of theoretical physics -- to become a potential fundamental principle in modern AI.

Related papers

Evolutionary Developmental Biology Can Serve as the Conceptual Foundation for a New Design Paradigm in Artificial Intelligence [15.376349115976534]
Artificial intelligence (AI) has made significant strides in solving complex tasks.<n>Current neural network-based paradigm, while effective, is heavily constrained by inherent limitations.<n>Recent paradigm shift in evolutionary understanding has been largely overlooked in AI literature.
arXiv Detail & Related papers (2025-06-15T15:41:44Z)
Neural Thermodynamics I: Entropic Forces in Deep and Universal Representation Learning [0.30723404270319693]
We propose a rigorous entropic-force theory for understanding the learning dynamics of neural networks trained with gradient descent.<n>We show that representation learning is crucially governed by emergent entropic forces arising from symmetryity and discrete-time updates.
arXiv Detail & Related papers (2025-05-18T12:25:42Z)
Dynamical symmetries in the fluctuation-driven regime: an application of Noether's theorem to noisy dynamical systems [0.0]
Nonequilibrium physics provides a variational principle that describes how fairly generic noisy dynamical systems are most likely to transition between two states. We identify analogues of the conservation of energy, momentum, and angular momentum, and briefly discuss examples of each in the context of models of decision-making, recurrent neural networks, and diffusion generative models.
arXiv Detail & Related papers (2025-04-13T23:56:31Z)
Transformer Dynamics: A neuroscientific approach to interpretability of large language models [0.0]
We focus on the residual stream (RS) in transformer models, conceptualizing it as a dynamical system evolving across layers. We find that activations of individual RS units exhibit strong continuity across layers, despite the RS being a non-privileged basis. In reduced-dimensional spaces, the RS follows a curved trajectory with attractor-like dynamics in the lower layers.
arXiv Detail & Related papers (2025-02-17T18:49:40Z)
Network Dynamics-Based Framework for Understanding Deep Neural Networks [11.44947569206928]
We propose a theoretical framework to analyze learning dynamics through the lens of dynamical systems theory.<n>We redefine the notions of linearity and nonlinearity in neural networks by introducing two fundamental transformation units at the neuron level.<n>Different transformation modes lead to distinct collective behaviors in weight vector organization, different modes of information extraction, and the emergence of qualitatively different learning phases.
arXiv Detail & Related papers (2025-01-05T04:23:21Z)
Artificial Kuramoto Oscillatory Neurons [65.16453738828672]
We introduce Artificial Kuramotoy Neurons (AKOrN) as a dynamical alternative to threshold units. We show that this idea provides performance improvements across a wide spectrum of tasks. We believe that these empirical results show the importance of our assumptions at the most basic neuronal level of neural representation.
arXiv Detail & Related papers (2024-10-17T17:47:54Z)
A spring-block theory of feature learning in deep neural networks [11.396919965037636]
Feature-learning deep nets progressively collapse data to a regular low-dimensional geometry. We show how this phenomenon emerges from collective action of nonlinearity, noise, learning rate, and other choices that shape the dynamics. We propose a macroscopic mechanical theory that reproduces the diagram, explaining why some DNNs are lazy and some active, and linking feature learning across layers to generalization.
arXiv Detail & Related papers (2024-07-28T00:07:20Z)
Towards a theory of learning dynamics in deep state space models [12.262490032020832]
State space models (SSMs) have shown remarkable empirical performance on many long sequence modeling tasks. This work is a step toward a theory of learning dynamics in deep state space models.
arXiv Detail & Related papers (2024-07-10T00:01:56Z)
Learning Discrete Concepts in Latent Hierarchical Models [73.01229236386148]
Learning concepts from natural high-dimensional data holds potential in building human-aligned and interpretable machine learning models.<n>We formalize concepts as discrete latent causal variables that are related via a hierarchical causal model.<n>We substantiate our theoretical claims with synthetic data experiments.
arXiv Detail & Related papers (2024-06-01T18:01:03Z)
The Impact of Geometric Complexity on Neural Collapse in Transfer Learning [6.554326244334867]
Flatness of the loss surface and neural collapse have recently emerged as useful pre-training metrics.<n>We show through experiments and theory that mechanisms which affect the geometric complexity of the pre-trained network also influence the neural collapse.
arXiv Detail & Related papers (2024-05-24T16:52:09Z)
Disentangling the Causes of Plasticity Loss in Neural Networks [55.23250269007988]
We show that loss of plasticity can be decomposed into multiple independent mechanisms. We show that a combination of layer normalization and weight decay is highly effective at maintaining plasticity in a variety of synthetic nonstationary learning tasks.
arXiv Detail & Related papers (2024-02-29T00:02:33Z)
Binding Dynamics in Rotating Features [72.80071820194273]
We propose an alternative "cosine binding" mechanism, which explicitly computes the alignment between features and adjusts weights accordingly. This allows us to draw direct connections to self-attention and biological neural processes, and to shed light on the fundamental dynamics for object-centric representations to emerge in Rotating Features.
arXiv Detail & Related papers (2024-02-08T12:31:08Z)
Brain-Inspired Machine Intelligence: A Survey of Neurobiologically-Plausible Credit Assignment [65.268245109828]
We examine algorithms for conducting credit assignment in artificial neural networks that are inspired or motivated by neurobiology. We organize the ever-growing set of brain-inspired learning schemes into six general families and consider these in the context of backpropagation of errors. The results of this review are meant to encourage future developments in neuro-mimetic systems and their constituent learning processes.
arXiv Detail & Related papers (2023-12-01T05:20:57Z)
Learning reversible symplectic dynamics [0.0]
We propose a new neural network architecture for learning time-reversible dynamical systems from data. We focus on an adaptation to symplectic systems, because of their importance in physics-informed learning.
arXiv Detail & Related papers (2022-04-26T14:07:40Z)
Recent advances in deep learning theory [104.01582662336256]
This paper reviews and organizes the recent advances in deep learning theory. The literature is categorized in six groups: (1) complexity and capacity-based approaches for analysing the generalizability of deep learning; (2) differential equations and their dynamic systems for modelling gradient descent and its variants; (3) the geometrical structures of the loss landscape that drives the trajectories of the dynamic systems; and (5) theoretical foundations of several special structures in network architectures.
arXiv Detail & Related papers (2020-12-20T14:16:41Z)
Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task. This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z)
Developing Constrained Neural Units Over Time [81.19349325749037]
This paper focuses on an alternative way of defining Neural Networks, that is different from the majority of existing approaches. The structure of the neural architecture is defined by means of a special class of constraints that are extended also to the interaction with data. The proposed theory is cast into the time domain, in which data are presented to the network in an ordered manner.
arXiv Detail & Related papers (2020-09-01T09:07:25Z)
An Ode to an ODE [78.97367880223254]
We present a new paradigm for Neural ODE algorithms, called ODEtoODE, where time-dependent parameters of the main flow evolve according to a matrix flow on the group O(d) This nested system of two flows provides stability and effectiveness of training and provably solves the gradient vanishing-explosion problem.
arXiv Detail & Related papers (2020-06-19T22:05:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.