Related papers: Symmetry and Generalisation in Neural Approximations of Renormalisation Transformations

Symmetry and Generalisation in Neural Approximations of Renormalisation Transformations

URL: http://arxiv.org/abs/2510.16591v1
Date: Sat, 18 Oct 2025 17:29:23 GMT
Title: Symmetry and Generalisation in Neural Approximations of Renormalisation Transformations
Authors: Cassidy Ashworth, Pietro Liò, Francesco Caso,
Abstract summary: We evaluate the role of symmetry and network expressivity in the generalisation behaviour of neural networks.<n>We consider simple multilayer perceptrons (MLPs) and graph neural networks (GNNs)<n>Our results reveal a competition between symmetry constraints and expressivity, with overly complex models generalising poorly.
Score: 11.337632710839166
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep learning models have proven enormously successful at using multiple layers of representation to learn relevant features of structured data. Encoding physical symmetries into these models can improve performance on difficult tasks, and recent work has motivated the principle of parameter symmetry breaking and restoration as a unifying mechanism underlying their hierarchical learning dynamics. We evaluate the role of parameter symmetry and network expressivity in the generalisation behaviour of neural networks when learning a real-space renormalisation group (RG) transformation, using the central limit theorem (CLT) as a test case map. We consider simple multilayer perceptrons (MLPs) and graph neural networks (GNNs), and vary weight symmetries and activation functions across architectures. Our results reveal a competition between symmetry constraints and expressivity, with overly complex or overconstrained models generalising poorly. We analytically demonstrate this poor generalisation behaviour for certain constrained MLP architectures by recasting the CLT as a cumulant recursion relation and making use of an established framework to propagate cumulants through MLPs. We also empirically validate an extension of this framework from MLPs to GNNs, elucidating the internal information processing performed by these more complex models. These findings offer new insight into the learning dynamics of symmetric networks and their limitations in modelling structured physical transformations.

Related papers

The Neural Differential Manifold: An Architecture with Explicit Geometric Structure [8.201374511929538]
This paper introduces the Neural Differential Manifold (NDM), a novel neural network architecture that explicitly incorporates geometric structure into its fundamental design.<n>We analyze the theoretical advantages of this approach, including its potential for more efficient optimization, enhanced continual learning, and applications in scientific discovery and controllable generative modeling.
arXiv Detail & Related papers (2025-10-29T02:24:27Z)
Rethinking the Role of Dynamic Sparse Training for Scalable Deep Reinforcement Learning [58.533203990515034]
Scaling neural networks has driven breakthrough advances in machine learning, yet this paradigm fails in deep reinforcement learning (DRL)<n>We show that dynamic sparse training strategies provide module-specific benefits that complement the primary scalability foundation established by architectural improvements.<n>We finally distill these insights into Module-Specific Training (MST), a practical framework that exploits the benefits of architectural improvements and demonstrates substantial scalability gains across diverse RL algorithms without algorithmic modifications.
arXiv Detail & Related papers (2025-10-14T03:03:08Z)
On Linear Mode Connectivity of Mixture-of-Experts Architectures [1.6747713135100666]
We investigate the phenomenon of linear Mode Connectivity (LMC) in neural networks.<n>LMC is a notable phenomenon in the loss landscapes of neural networks, wherein independently trained models have been to be connected--up to varying symmetries of an algorithm.
arXiv Detail & Related papers (2025-09-14T16:51:41Z)
Generalized Linear Mode Connectivity for Transformers [87.32299363530996]
A striking phenomenon is linear mode connectivity (LMC), where independently trained models can be connected by low- or zero-loss paths.<n>Prior work has predominantly focused on neuron re-ordering through permutations, but such approaches are limited in scope.<n>We introduce a unified framework that captures four symmetry classes: permutations, semi-permutations, transformations, and general invertible maps.<n>This generalization enables, for the first time, the discovery of low- and zero-barrier linear paths between independently trained Vision Transformers and GPT-2 models.
arXiv Detail & Related papers (2025-06-28T01:46:36Z)
Model Hemorrhage and the Robustness Limits of Large Language Models [119.46442117681147]
Large language models (LLMs) demonstrate strong performance across natural language processing tasks, yet undergo significant performance degradation when modified for deployment.<n>We define this phenomenon as model hemorrhage - performance decline caused by parameter alterations and architectural changes.
arXiv Detail & Related papers (2025-03-31T10:16:03Z)
Generalized Factor Neural Network Model for High-dimensional Regression [50.554377879576066]
We tackle the challenges of modeling high-dimensional data sets with latent low-dimensional structures hidden within complex, non-linear, and noisy relationships.<n>Our approach enables a seamless integration of concepts from non-parametric regression, factor models, and neural networks for high-dimensional regression.
arXiv Detail & Related papers (2025-02-16T23:13:55Z)
KKANs: Kurkova-Kolmogorov-Arnold Networks and Their Learning Dynamics [1.8434042562191815]
Kurkova-Kolmogorov-Arnold Network (KKAN) is a new two-block architecture that combines robust multi-layer perceptron (MLP) based inner functions with flexible linear combinations of basis functions as outer functions.<n> benchmark results show that KKANs outperform original Kolmogorov-Arnold Networks (KANs) in function approximation and operator learning tasks.
arXiv Detail & Related papers (2024-12-21T19:01:38Z)
Enhancing lattice kinetic schemes for fluid dynamics with Lattice-Equivariant Neural Networks [79.16635054977068]
We present a new class of equivariant neural networks, dubbed Lattice-Equivariant Neural Networks (LENNs) Our approach develops within a recently introduced framework aimed at learning neural network-based surrogate models Lattice Boltzmann collision operators. Our work opens towards practical utilization of machine learning-augmented Lattice Boltzmann CFD in real-world simulations.
arXiv Detail & Related papers (2024-05-22T17:23:15Z)
Symmetry-enforcing neural networks with applications to constitutive modeling [0.0]
We show how to combine state-of-the-art micromechanical modeling and advanced machine learning techniques to homogenize complex microstructures exhibiting non-linear and history dependent behaviors. The resulting homogenized model, termed smart law (SCL), enables the adoption of microly informed laws into finite element solvers at a fraction of the computational cost required by traditional concurrent multiscale approaches. In this work, the capabilities of SCLs are expanded via the introduction of a novel methodology that enforces material symmetries at the neuron level.
arXiv Detail & Related papers (2023-12-21T01:12:44Z)
Provably Efficient Neural Estimation of Structural Equation Model: An Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs) We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent. For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.