Related papers: Implicit Bias of Linear Equivariant Networks

Implicit Bias of Linear Equivariant Networks

URL: http://arxiv.org/abs/2110.06084v1
Date: Tue, 12 Oct 2021 15:34:25 GMT
Title: Implicit Bias of Linear Equivariant Networks
Authors: Hannah Lawrence, Kristian Georgiev, Andrew Dienes, Bobak T. Kiani
Abstract summary: Group equivariant convolutional neural networks (G-CNNs) are generalizations of convolutional neural networks (CNNs) We show that $L$-layer full-width linear G-CNNs trained via gradient descent converge to solutions with low-rank Fourier matrix coefficients.
Score: 2.580765958706854
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Group equivariant convolutional neural networks (G-CNNs) are generalizations of convolutional neural networks (CNNs) which excel in a wide range of scientific and technical applications by explicitly encoding group symmetries, such as rotations and permutations, in their architectures. Although the success of G-CNNs is driven by the explicit symmetry bias of their convolutional architecture, a recent line of work has proposed that the implicit bias of training algorithms on a particular parameterization (or architecture) is key to understanding generalization for overparameterized neural nets. In this context, we show that $L$-layer full-width linear G-CNNs trained via gradient descent in a binary classification task converge to solutions with low-rank Fourier matrix coefficients, regularized by the $2/L$-Schatten matrix norm. Our work strictly generalizes previous analysis on the implicit bias of linear CNNs to linear G-CNNs over all finite groups, including the challenging setting of non-commutative symmetry groups (such as permutations). We validate our theorems via experiments on a variety of groups and empirically explore more realistic nonlinear networks, which locally capture similar regularization patterns. Finally, we provide intuitive interpretations of our Fourier space implicit regularization results in real space via uncertainty principles.

Related papers

Symmetry-Based Structured Matrices for Efficient Approximately Equivariant Networks [5.187307904567701]
Group Matrices (GMs) are a forgotten precursor to the modern notion of regular representations of finite groups. GMs can be employed to extend all the elementary operations of CNNs to general discrete groups. We show that GMs can be employed to extend all the elementary operations of CNNs to general discrete groups.
arXiv Detail & Related papers (2024-09-18T07:52:33Z)
Enhancing lattice kinetic schemes for fluid dynamics with Lattice-Equivariant Neural Networks [79.16635054977068]
We present a new class of equivariant neural networks, dubbed Lattice-Equivariant Neural Networks (LENNs) Our approach develops within a recently introduced framework aimed at learning neural network-based surrogate models Lattice Boltzmann collision operators. Our work opens towards practical utilization of machine learning-augmented Lattice Boltzmann CFD in real-world simulations.
arXiv Detail & Related papers (2024-05-22T17:23:15Z)
Neural Tangent Kernels Motivate Graph Neural Networks with Cross-Covariance Graphs [94.44374472696272]
We investigate NTKs and alignment in the context of graph neural networks (GNNs) Our results establish the theoretical guarantees on the optimality of the alignment for a two-layer GNN. These guarantees are characterized by the graph shift operator being a function of the cross-covariance between the input and the output data.
arXiv Detail & Related papers (2023-10-16T19:54:21Z)
Geometrical aspects of lattice gauge equivariant convolutional neural networks [0.0]
Lattice gauge equivariant convolutional neural networks (L-CNNs) are a framework for convolutional neural networks that can be applied to non-Abelian lattice gauge theories.
arXiv Detail & Related papers (2023-03-20T20:49:08Z)
Permutation Equivariant Neural Functionals [92.0667671999604]
This work studies the design of neural networks that can process the weights or gradients of other neural networks. We focus on the permutation symmetries that arise in the weights of deep feedforward networks because hidden layer neurons have no inherent order. In our experiments, we find that permutation equivariant neural functionals are effective on a diverse set of tasks.
arXiv Detail & Related papers (2023-02-27T18:52:38Z)
Globally Gated Deep Linear Networks [3.04585143845864]
We introduce Globally Gated Deep Linear Networks (GGDLNs) where gating units are shared among all processing units in each layer. We derive exact equations for the generalization properties in these networks in the finite-width thermodynamic limit. Our work is the first exact theoretical solution of learning in a family of nonlinear networks with finite width.
arXiv Detail & Related papers (2022-10-31T16:21:56Z)
On the Effective Number of Linear Regions in Shallow Univariate ReLU Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons. Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z)
Revisiting Transformation Invariant Geometric Deep Learning: Are Initial Representations All You Need? [80.86819657126041]
We show that transformation-invariant and distance-preserving initial representations are sufficient to achieve transformation invariance. Specifically, we realize transformation-invariant and distance-preserving initial point representations by modifying multi-dimensional scaling. We prove that TinvNN can strictly guarantee transformation invariance, being general and flexible enough to be combined with the existing neural networks.
arXiv Detail & Related papers (2021-12-23T03:52:33Z)
Lattice gauge symmetry in neural networks [0.0]
We review a novel neural network architecture called lattice gauge equivariant convolutional neural networks (L-CNNs) We discuss the concept of gauge equivariance which we use to explicitly construct a gauge equivariant convolutional layer and a bilinear layer. The performance of L-CNNs and non-equivariant CNNs is compared using seemingly simple non-linear regression tasks.
arXiv Detail & Related papers (2021-11-08T11:20:11Z)
Lattice gauge equivariant convolutional neural networks [0.0]
We propose Lattice gauge equivariant Convolutional Neural Networks (L-CNNs) for generic machine learning applications. We show that L-CNNs can learn and generalize gauge invariant quantities that traditional convolutional neural networks are incapable of finding.
arXiv Detail & Related papers (2020-12-23T19:00:01Z)
Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms. We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.