On the Ability of Deep Networks to Learn Symmetries from Data: A Neural Kernel Theory
- URL: http://arxiv.org/abs/2412.11521v2
- Date: Thu, 26 Jun 2025 15:02:44 GMT
- Title: On the Ability of Deep Networks to Learn Symmetries from Data: A Neural Kernel Theory
- Authors: Andrea Perin, Stephane Deny,
- Abstract summary: In this work, we aim to understand when and how deep networks learn symmetries from data.<n>Inspired by real-world scenarios, we study a classification paradigm where data symmetries are only partially observed during training.<n>In the infinite-width limit, where kernel analogies apply, we derive a neural kernel theory of symmetry learning.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Symmetries (transformations by group actions) are present in many datasets, and leveraging them holds considerable promise for improving predictions in machine learning. In this work, we aim to understand when and how deep networks -- with standard architectures trained in a standard, supervised way -- learn symmetries from data. Inspired by real-world scenarios, we study a classification paradigm where data symmetries are only partially observed during training: some classes include all transformations of a cyclic group, while others -- only a subset. In the infinite-width limit, where kernel analogies apply, we derive a neural kernel theory of symmetry learning. The group-cyclic nature of the dataset allows us to analyze the Gram matrix of neural kernels in the Fourier domain; here we find a simple characterization of the generalization error as a function of class separation (signal) and class-orbit density (noise). This characterization reveals that generalization can only be successful when the local structure of the data prevails over its non-local, symmetry-induced structure, in the kernel space defined by the architecture. We extend our theoretical treatment to any finite group, including non-abelian groups. Our framework also applies to equivariant architectures (e.g., CNNs), and recovers their success in the special case where the architecture matches the inherent symmetry of the data. Empirically, our theory reproduces the generalization failure of finite-width networks (MLP, CNN, ViT) trained on partially observed versions of rotated-MNIST. We conclude that conventional deep networks lack a mechanism to learn symmetries that have not been explicitly embedded in their architecture a priori. Our framework could be extended to guide the design of architectures and training procedures able to learn symmetries from data.
Related papers
- Generalized Linear Mode Connectivity for Transformers [87.32299363530996]
A striking phenomenon is linear mode connectivity (LMC), where independently trained models can be connected by low- or zero-loss paths.<n>Prior work has predominantly focused on neuron re-ordering through permutations, but such approaches are limited in scope.<n>We introduce a unified framework that captures four symmetry classes: permutations, semi-permutations, transformations, and general invertible maps.<n>This generalization enables, for the first time, the discovery of low- and zero-barrier linear paths between independently trained Vision Transformers and GPT-2 models.
arXiv Detail & Related papers (2025-06-28T01:46:36Z) - Scaling Laws and Representation Learning in Simple Hierarchical Languages: Transformers vs. Convolutional Architectures [49.19753720526998]
We derive theoretical scaling laws for neural network performance on synthetic datasets.<n>We validate that convolutional networks, whose structure aligns with that of the generative process through locality and weight sharing, enjoy a faster scaling of performance.<n>This finding clarifies the architectural biases underlying neural scaling laws and highlights how representation learning is shaped by the interaction between model architecture and the statistical properties of data.
arXiv Detail & Related papers (2025-05-11T17:44:14Z) - Learning Broken Symmetries with Approximate Invariance [1.0485739694839669]
In many cases, the exact underlying symmetry is present only in an idealized dataset, and is broken in actual data.
Standard approaches, such as data augmentation or equivariant networks fail to represent the nature of the full, broken symmetry.
We propose a learning model which balances the generality and performance of unconstrained networks with the rapid learning of constrained networks.
arXiv Detail & Related papers (2024-12-25T04:29:04Z) - Symmetry Discovery for Different Data Types [52.2614860099811]
Equivariant neural networks incorporate symmetries into their architecture, achieving higher generalization performance.
We propose LieSD, a method for discovering symmetries via trained neural networks which approximate the input-output mappings of the tasks.
We validate the performance of LieSD on tasks with symmetries such as the two-body problem, the moment of inertia matrix prediction, and top quark tagging.
arXiv Detail & Related papers (2024-10-13T13:39:39Z) - Symmetry From Scratch: Group Equivariance as a Supervised Learning Task [1.8570740863168362]
In machine learning datasets with symmetries, the paradigm for backward compatibility with symmetry-breaking has been to relax equivariant architectural constraints.
We introduce symmetry-cloning, a method for inducing equivariance in machine learning models.
arXiv Detail & Related papers (2024-10-05T00:44:09Z) - Enhancing lattice kinetic schemes for fluid dynamics with Lattice-Equivariant Neural Networks [79.16635054977068]
We present a new class of equivariant neural networks, dubbed Lattice-Equivariant Neural Networks (LENNs)
Our approach develops within a recently introduced framework aimed at learning neural network-based surrogate models Lattice Boltzmann collision operators.
Our work opens towards practical utilization of machine learning-augmented Lattice Boltzmann CFD in real-world simulations.
arXiv Detail & Related papers (2024-05-22T17:23:15Z) - A Unified Framework to Enforce, Discover, and Promote Symmetry in Machine Learning [5.1105250336911405]
We provide a unifying theoretical and methodological framework for incorporating symmetry into machine learning models.<n>We show that enforcing and discovering symmetry are linear-algebraic tasks that are dual with respect to the bilinear structure of the Lie derivative.<n>We propose a novel way to promote symmetry by introducing a class of convex regularization functions based on the Lie derivative and nuclear norm relaxation.
arXiv Detail & Related papers (2023-11-01T01:19:54Z) - Deep Learning Symmetries and Their Lie Groups, Algebras, and Subalgebras
from First Principles [55.41644538483948]
We design a deep-learning algorithm for the discovery and identification of the continuous group of symmetries present in a labeled dataset.
We use fully connected neural networks to model the transformations symmetry and the corresponding generators.
Our study also opens the door for using a machine learning approach in the mathematical study of Lie groups and their properties.
arXiv Detail & Related papers (2023-01-13T16:25:25Z) - Persistence-based operators in machine learning [62.997667081978825]
We introduce a class of persistence-based neural network layers.
Persistence-based layers allow the users to easily inject knowledge about symmetries respected by the data, are equipped with learnable weights, and can be composed with state-of-the-art neural architectures.
arXiv Detail & Related papers (2022-12-28T18:03:41Z) - LieGG: Studying Learned Lie Group Generators [1.5293427903448025]
Symmetries built into a neural network have appeared to be very beneficial for a wide range of tasks as it saves the data to learn them.
We present a method to extract symmetries learned by a neural network and to evaluate the degree to which a network is invariant to them.
arXiv Detail & Related papers (2022-10-09T20:42:37Z) - On the Symmetries of Deep Learning Models and their Internal
Representations [1.418465438044804]
We seek to connect the symmetries arising from the architecture of a family of models with the symmetries of that family's internal representation of data.
Our work suggests that the symmetries of a network are propagated into the symmetries in that network's representation of data.
arXiv Detail & Related papers (2022-05-27T22:29:08Z) - Quasi-orthogonality and intrinsic dimensions as measures of learning and
generalisation [55.80128181112308]
We show that dimensionality and quasi-orthogonality of neural networks' feature space may jointly serve as network's performance discriminants.
Our findings suggest important relationships between the networks' final performance and properties of their randomly initialised feature spaces.
arXiv Detail & Related papers (2022-03-30T21:47:32Z) - Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules.
inputs to the model are routed through a sequence of functions in a way that is end-to-end learned.
We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z) - Encoding Involutory Invariance in Neural Networks [1.6371837018687636]
In certain situations, Neural Networks (NN) are trained upon data that obey underlying physical symmetries.
In this work, we explore a special kind of symmetry where functions are invariant with respect to involutory linear/affine transformations up to parity.
Numerical experiments indicate that the proposed models outperform baseline networks while respecting the imposed symmetry.
An adaption of our technique to convolutional NN classification tasks for datasets with inherent horizontal/vertical reflection symmetry has also been proposed.
arXiv Detail & Related papers (2021-06-07T16:07:15Z) - MDP Homomorphic Networks: Group Symmetries in Reinforcement Learning [90.20563679417567]
This paper introduces MDP homomorphic networks for deep reinforcement learning.
MDP homomorphic networks are neural networks that are equivariant under symmetries in the joint state-action space of an MDP.
We show that such networks converge faster than unstructured networks on CartPole, a grid world and Pong.
arXiv Detail & Related papers (2020-06-30T15:38:37Z) - Detecting Symmetries with Neural Networks [0.0]
We make extensive use of the structure in the embedding layer of the neural network.
We identify whether a symmetry is present and to identify orbits of the symmetry in the input.
For this example we present a novel data representation in terms of graphs.
arXiv Detail & Related papers (2020-03-30T17:58:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.