Related papers: An Equivariance Toolbox for Learning Dynamics

An Equivariance Toolbox for Learning Dynamics

URL: http://arxiv.org/abs/2512.21447v1
Date: Wed, 24 Dec 2025 23:42:07 GMT
Title: An Equivariance Toolbox for Learning Dynamics
Authors: Yongyi Yang, Liu Ziyin,
Abstract summary: We develop a general equivariance toolbox that yields coupled first- and second-order constraints on learning dynamics.<n>At the first order, our framework unifies conservation laws and implicit-bias relations as special cases of a single identity.<n>At the second order, it provides structural predictions about curvature.
Score: 13.651450618432094
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Many theoretical results in deep learning can be traced to symmetry or equivariance of neural networks under parameter transformations. However, existing analyses are typically problem-specific and focus on first-order consequences such as conservation laws, while the implications for second-order structure remain less understood. We develop a general equivariance toolbox that yields coupled first- and second-order constraints on learning dynamics. The framework extends classical Noether-type analyses in three directions: from gradient constraints to Hessian constraints, from symmetry to general equivariance, and from continuous to discrete transformations. At the first order, our framework unifies conservation laws and implicit-bias relations as special cases of a single identity. At the second order, it provides structural predictions about curvature: which directions are flat or sharp, how the gradient aligns with Hessian eigenspaces, and how the loss landscape geometry reflects the underlying transformation structure. We illustrate the framework through several applications, recovering known results while also deriving new characterizations that connect transformation structure to modern empirical observations about optimization geometry.

Related papers

Thermodynamic Response Functions in Singular Bayesian Models [0.12183405753834557]
We formalize an observable algebra that quotients out non-identifiable directions, allowing structurally meaningful order parameters to be constructed in singular models.<n>Our results suggest that thermodynamic response theory provides a natural organizing framework for interpreting complexity, predictive variability, and structural reorganization in singular Bayesian learning.
arXiv Detail & Related papers (2026-03-05T18:50:20Z)
A General Weighting Theory for Ensemble Learning: Beyond Variance Reduction via Spectral and Geometric Structure [0.0]
This paper develops a general weighting theory for ensemble learning.<n>We formalize ensembles as linear operators acting on a hypothesis space.<n>We show how non-uniform, structured weights can outperform uniform averaging.
arXiv Detail & Related papers (2025-12-25T08:51:01Z)
The Neural Differential Manifold: An Architecture with Explicit Geometric Structure [8.201374511929538]
This paper introduces the Neural Differential Manifold (NDM), a novel neural network architecture that explicitly incorporates geometric structure into its fundamental design.<n>We analyze the theoretical advantages of this approach, including its potential for more efficient optimization, enhanced continual learning, and applications in scientific discovery and controllable generative modeling.
arXiv Detail & Related papers (2025-10-29T02:24:27Z)
VIKING: Deep variational inference with stochastic projections [48.946143517489496]
Variational mean field approximations tend to struggle with contemporary overparametrized deep neural networks.<n>We propose a simple variational family that considers two independent linear subspaces of the parameter space.<n>This allows us to build a fully-correlated approximate posterior reflecting the overparametrization.
arXiv Detail & Related papers (2025-10-27T15:38:35Z)
Understanding Post-Training Structural Changes in Large Language Models [3.054513120350576]
Post-training fundamentally alters the behavior of large language models (LLMs)<n>This work focuses on two widely adopted post-training methods: instruction tuning and long-chain-of-thought (Long-CoT) distillation.
arXiv Detail & Related papers (2025-09-22T15:03:36Z)
Generalized Linear Mode Connectivity for Transformers [87.32299363530996]
A striking phenomenon is linear mode connectivity (LMC), where independently trained models can be connected by low- or zero-loss paths.<n>Prior work has predominantly focused on neuron re-ordering through permutations, but such approaches are limited in scope.<n>We introduce a unified framework that captures four symmetry classes: permutations, semi-permutations, transformations, and general invertible maps.<n>This generalization enables, for the first time, the discovery of low- and zero-barrier linear paths between independently trained Vision Transformers and GPT-2 models.
arXiv Detail & Related papers (2025-06-28T01:46:36Z)
Why Neural Network Can Discover Symbolic Structures with Gradient-based Training: An Algebraic and Geometric Foundation for Neurosymbolic Reasoning [73.18052192964349]
We develop a theoretical framework that explains how discrete symbolic structures can emerge naturally from continuous neural network training dynamics.<n>By lifting neural parameters to a measure space and modeling training as Wasserstein gradient flow, we show that under geometric constraints, the parameter measure $mu_t$ undergoes two concurrent phenomena.
arXiv Detail & Related papers (2025-06-26T22:40:30Z)
Constrained belief updates explain geometric structures in transformer representations [1.1666234644810893]
We integrate the model-agnostic theory of optimal prediction with mechanistic interpretability to analyze transformers trained on a tractable family of hidden Markov models.<n>Our analysis focuses on single-layer transformers, revealing how the first attention layer implements constrained updates.<n>We show how both the algorithmic behavior and the underlying geometry of these representations can be theoretically predicted in detail.
arXiv Detail & Related papers (2025-02-04T03:03:54Z)
STITCH: Surface reconstrucTion using Implicit neural representations with Topology Constraints and persistent Homology [23.70495314317551]
We present STITCH, a novel approach for neural implicit surface reconstruction of a sparse and irregularly spaced point cloud.<n>We develop a new differentiable framework based on persistent homology to formulate topological loss terms that enforce the prior of a single 2-manifold object.
arXiv Detail & Related papers (2024-12-24T22:55:35Z)
Relative Representations: Topological and Geometric Perspectives [50.85040046976025]
Relative representations are an established approach to zero-shot model stitching.<n>We introduce a normalization procedure in the relative transformation, resulting in invariance to non-isotropic rescalings and permutations.<n>Second, we propose to deploy topological densification when fine-tuning relative representations, a topological regularization loss encouraging clustering within classes.
arXiv Detail & Related papers (2024-09-17T08:09:22Z)
A Theory of Topological Derivatives for Inverse Rendering of Geometry [87.49881303178061]
We introduce a theoretical framework for differentiable surface evolution that allows discrete topology changes through the use of topological derivatives. We validate the proposed theory with optimization of closed curves in 2D and surfaces in 3D to lend insights into limitations of current methods.
arXiv Detail & Related papers (2023-08-19T00:55:55Z)
Understanding Graph Neural Networks with Generalized Geometric Scattering Transforms [67.88675386638043]
The scattering transform is a multilayered wavelet-based deep learning architecture that acts as a model of convolutional neural networks. We introduce windowed and non-windowed geometric scattering transforms for graphs based upon a very general class of asymmetric wavelets. We show that these asymmetric graph scattering transforms have many of the same theoretical guarantees as their symmetric counterparts.
arXiv Detail & Related papers (2019-11-14T17:23:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.