On the Ability of Deep Networks to Learn Symmetries from Data: A Neural Kernel Theory
- URL: http://arxiv.org/abs/2412.11521v1
- Date: Mon, 16 Dec 2024 07:56:54 GMT
- Title: On the Ability of Deep Networks to Learn Symmetries from Data: A Neural Kernel Theory
- Authors: Andrea Perin, Stephane Deny,
- Abstract summary: We show that generalization can only be successful when the local structure of the data prevails over its non-local, symmetric, structure.
Our framework could be extended to guide the design of architectures and training procedures able to learn symmetries from data.
- Score: 0.0
- License:
- Abstract: Symmetries (transformations by group actions) are present in many datasets, and leveraging them holds significant promise for improving predictions in machine learning. In this work, we aim to understand when and how deep networks can learn symmetries from data. We focus on a supervised classification paradigm where data symmetries are only partially observed during training: some classes include all transformations of a cyclic group, while others include only a subset. We ask: can deep networks generalize symmetry invariance to the partially sampled classes? In the infinite-width limit, where kernel analogies apply, we derive a neural kernel theory of symmetry learning to address this question. The group-cyclic nature of the dataset allows us to analyze the spectrum of neural kernels in the Fourier domain; here we find a simple characterization of the generalization error as a function of the interaction between class separation (signal) and class-orbit density (noise). We observe that generalization can only be successful when the local structure of the data prevails over its non-local, symmetric, structure, in the kernel space defined by the architecture. This occurs when (1) classes are sufficiently distinct and (2) class orbits are sufficiently dense. Our framework also applies to equivariant architectures (e.g., CNNs), and recovers their success in the special case where the architecture matches the inherent symmetry of the data. Empirically, our theory reproduces the generalization failure of finite-width networks (MLP, CNN, ViT) trained on partially observed versions of rotated-MNIST. We conclude that conventional networks trained with supervision lack a mechanism to learn symmetries that have not been explicitly embedded in their architecture a priori. Our framework could be extended to guide the design of architectures and training procedures able to learn symmetries from data.
Related papers
- Learning Broken Symmetries with Approximate Invariance [1.0485739694839669]
In many cases, the exact underlying symmetry is present only in an idealized dataset, and is broken in actual data.
Standard approaches, such as data augmentation or equivariant networks fail to represent the nature of the full, broken symmetry.
We propose a learning model which balances the generality and performance of unconstrained networks with the rapid learning of constrained networks.
arXiv Detail & Related papers (2024-12-25T04:29:04Z) - Symmetry Discovery for Different Data Types [52.2614860099811]
Equivariant neural networks incorporate symmetries into their architecture, achieving higher generalization performance.
We propose LieSD, a method for discovering symmetries via trained neural networks which approximate the input-output mappings of the tasks.
We validate the performance of LieSD on tasks with symmetries such as the two-body problem, the moment of inertia matrix prediction, and top quark tagging.
arXiv Detail & Related papers (2024-10-13T13:39:39Z) - Enhancing lattice kinetic schemes for fluid dynamics with Lattice-Equivariant Neural Networks [79.16635054977068]
We present a new class of equivariant neural networks, dubbed Lattice-Equivariant Neural Networks (LENNs)
Our approach develops within a recently introduced framework aimed at learning neural network-based surrogate models Lattice Boltzmann collision operators.
Our work opens towards practical utilization of machine learning-augmented Lattice Boltzmann CFD in real-world simulations.
arXiv Detail & Related papers (2024-05-22T17:23:15Z) - Deep Learning Symmetries and Their Lie Groups, Algebras, and Subalgebras
from First Principles [55.41644538483948]
We design a deep-learning algorithm for the discovery and identification of the continuous group of symmetries present in a labeled dataset.
We use fully connected neural networks to model the transformations symmetry and the corresponding generators.
Our study also opens the door for using a machine learning approach in the mathematical study of Lie groups and their properties.
arXiv Detail & Related papers (2023-01-13T16:25:25Z) - LieGG: Studying Learned Lie Group Generators [1.5293427903448025]
Symmetries built into a neural network have appeared to be very beneficial for a wide range of tasks as it saves the data to learn them.
We present a method to extract symmetries learned by a neural network and to evaluate the degree to which a network is invariant to them.
arXiv Detail & Related papers (2022-10-09T20:42:37Z) - On the Symmetries of Deep Learning Models and their Internal
Representations [1.418465438044804]
We seek to connect the symmetries arising from the architecture of a family of models with the symmetries of that family's internal representation of data.
Our work suggests that the symmetries of a network are propagated into the symmetries in that network's representation of data.
arXiv Detail & Related papers (2022-05-27T22:29:08Z) - Quasi-orthogonality and intrinsic dimensions as measures of learning and
generalisation [55.80128181112308]
We show that dimensionality and quasi-orthogonality of neural networks' feature space may jointly serve as network's performance discriminants.
Our findings suggest important relationships between the networks' final performance and properties of their randomly initialised feature spaces.
arXiv Detail & Related papers (2022-03-30T21:47:32Z) - Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules.
inputs to the model are routed through a sequence of functions in a way that is end-to-end learned.
We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z) - Encoding Involutory Invariance in Neural Networks [1.6371837018687636]
In certain situations, Neural Networks (NN) are trained upon data that obey underlying physical symmetries.
In this work, we explore a special kind of symmetry where functions are invariant with respect to involutory linear/affine transformations up to parity.
Numerical experiments indicate that the proposed models outperform baseline networks while respecting the imposed symmetry.
An adaption of our technique to convolutional NN classification tasks for datasets with inherent horizontal/vertical reflection symmetry has also been proposed.
arXiv Detail & Related papers (2021-06-07T16:07:15Z) - MDP Homomorphic Networks: Group Symmetries in Reinforcement Learning [90.20563679417567]
This paper introduces MDP homomorphic networks for deep reinforcement learning.
MDP homomorphic networks are neural networks that are equivariant under symmetries in the joint state-action space of an MDP.
We show that such networks converge faster than unstructured networks on CartPole, a grid world and Pong.
arXiv Detail & Related papers (2020-06-30T15:38:37Z) - Detecting Symmetries with Neural Networks [0.0]
We make extensive use of the structure in the embedding layer of the neural network.
We identify whether a symmetry is present and to identify orbits of the symmetry in the input.
For this example we present a novel data representation in terms of graphs.
arXiv Detail & Related papers (2020-03-30T17:58:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.