A General Framework For Proving The Equivariant Strong Lottery Ticket
Hypothesis
- URL: http://arxiv.org/abs/2206.04270v1
- Date: Thu, 9 Jun 2022 04:40:18 GMT
- Title: A General Framework For Proving The Equivariant Strong Lottery Ticket
Hypothesis
- Authors: Damien Ferbach, Christos Tsirigotis, Gauthier Gidel, and Avishek
(Joey) Bose
- Abstract summary: Modern neural networks are capable of incorporating more than just translation symmetry.
We generalize the Strong Lottery Ticket Hypothesis (SLTH) to functions that preserve the action of the group $G$.
We prove our theory by overparametrized $textE(2)$-steerable CNNs and message passing GNNs.
- Score: 15.376680573592997
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Strong Lottery Ticket Hypothesis (SLTH) stipulates the existence of a
subnetwork within a sufficiently overparameterized (dense) neural network that
-- when initialized randomly and without any training -- achieves the accuracy
of a fully trained target network. Recent work by \citet{da2022proving}
demonstrates that the SLTH can also be extended to translation equivariant
networks -- i.e. CNNs -- with the same level of overparametrization as needed
for SLTs in dense networks. However, modern neural networks are capable of
incorporating more than just translation symmetry, and developing general
equivariant architectures such as rotation and permutation has been a powerful
design principle. In this paper, we generalize the SLTH to functions that
preserve the action of the group $G$ -- i.e. $G$-equivariant network -- and
prove, with high probability, that one can prune a randomly initialized
overparametrized $G$-equivariant network to a $G$-equivariant subnetwork that
approximates another fully trained $G$-equivariant network of fixed width and
depth. We further prove that our prescribed overparametrization scheme is also
optimal as a function of the error tolerance. We develop our theory for a large
range of groups, including important ones such as subgroups of the Euclidean
group $\text{E}(n)$ and subgroups of the symmetric group $G \leq \mathcal{S}_n$
-- allowing us to find SLTs for MLPs, CNNs, $\text{E}(2)$-steerable CNNs, and
permutation equivariant networks as specific instantiations of our unified
framework which completely extends prior work. Empirically, we verify our
theory by pruning overparametrized $\text{E}(2)$-steerable CNNs and message
passing GNNs to match the performance of trained target networks within a given
error tolerance.
Related papers
- Enhancing lattice kinetic schemes for fluid dynamics with Lattice-Equivariant Neural Networks [79.16635054977068]
We present a new class of equivariant neural networks, dubbed Lattice-Equivariant Neural Networks (LENNs)
Our approach develops within a recently introduced framework aimed at learning neural network-based surrogate models Lattice Boltzmann collision operators.
Our work opens towards practical utilization of machine learning-augmented Lattice Boltzmann CFD in real-world simulations.
arXiv Detail & Related papers (2024-05-22T17:23:15Z) - G-RepsNet: A Fast and General Construction of Equivariant Networks for
Arbitrary Matrix Groups [8.24167511378417]
Group equivariant networks are useful in a wide range of deep learning tasks.
Here, we introduce Group Representation Networks (G-RepsNets), a lightweight equivariant network for arbitrary groups.
We show that G-RepsNet is competitive to G-FNO (Helwig et al., 2023) and EGNN (Satorras et al., 2023) on N-body predictions and solving PDEs, respectively.
arXiv Detail & Related papers (2024-02-23T16:19:49Z) - Implicit Convolutional Kernels for Steerable CNNs [5.141137421503899]
Steerable convolutional neural networks (CNNs) provide a general framework for building neural networks equivariant to translations and transformations of an origin-preserving group $G$.
We propose using implicit neural representation via multi-layer perceptrons (MLPs) to parameterize $G$-steerable kernels.
We prove the effectiveness of our method on multiple tasks, including N-body simulations, point cloud classification and molecular property prediction.
arXiv Detail & Related papers (2022-12-12T18:10:33Z) - Equivariant Transduction through Invariant Alignment [71.45263447328374]
We introduce a novel group-equivariant architecture that incorporates a group-in hard alignment mechanism.
We find that our network's structure allows it to develop stronger equivariant properties than existing group-equivariant approaches.
We additionally find that it outperforms previous group-equivariant networks empirically on the SCAN task.
arXiv Detail & Related papers (2022-09-22T11:19:45Z) - Robust Training and Verification of Implicit Neural Networks: A
Non-Euclidean Contractive Approach [64.23331120621118]
This paper proposes a theoretical and computational framework for training and robustness verification of implicit neural networks.
We introduce a related embedded network and show that the embedded network can be used to provide an $ell_infty$-norm box over-approximation of the reachable sets of the original network.
We apply our algorithms to train implicit neural networks on the MNIST dataset and compare the robustness of our models with the models trained via existing approaches in the literature.
arXiv Detail & Related papers (2022-08-08T03:13:24Z) - Universality of group convolutional neural networks based on ridgelet
analysis on groups [10.05944106581306]
We investigate the approximation property of group convolutional neural networks (GCNNs) based on the ridgelet theory.
We formulate a versatile GCNN as a nonlinear mapping between group representations.
arXiv Detail & Related papers (2022-05-30T02:52:22Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - Dual Lottery Ticket Hypothesis [71.95937879869334]
Lottery Ticket Hypothesis (LTH) provides a novel view to investigate sparse network training and maintain its capacity.
In this work, we regard the winning ticket from LTH as the subnetwork which is in trainable condition and its performance as our benchmark.
We propose a simple sparse network training strategy, Random Sparse Network Transformation (RST), to substantiate our DLTH.
arXiv Detail & Related papers (2022-03-08T18:06:26Z) - Self-Ensembling GAN for Cross-Domain Semantic Segmentation [107.27377745720243]
This paper proposes a self-ensembling generative adversarial network (SE-GAN) exploiting cross-domain data for semantic segmentation.
In SE-GAN, a teacher network and a student network constitute a self-ensembling model for generating semantic segmentation maps, which together with a discriminator, forms a GAN.
Despite its simplicity, we find SE-GAN can significantly boost the performance of adversarial training and enhance the stability of the model.
arXiv Detail & Related papers (2021-12-15T09:50:25Z) - Implicit Bias of Linear Equivariant Networks [2.580765958706854]
Group equivariant convolutional neural networks (G-CNNs) are generalizations of convolutional neural networks (CNNs)
We show that $L$-layer full-width linear G-CNNs trained via gradient descent converge to solutions with low-rank Fourier matrix coefficients.
arXiv Detail & Related papers (2021-10-12T15:34:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.