Related papers: The Lie Derivative for Measuring Learned Equivariance

The Lie Derivative for Measuring Learned Equivariance

URL: http://arxiv.org/abs/2210.02984v2
Date: Tue, 18 Jun 2024 15:01:13 GMT
Title: The Lie Derivative for Measuring Learned Equivariance
Authors: Nate Gruver, Marc Finzi, Micah Goldblum, Andrew Gordon Wilson,
Abstract summary: We study the equivariance properties of hundreds of pretrained models, spanning CNNs, transformers, and Mixer architectures. We find that many violations of equivariance can be linked to spatial aliasing in ubiquitous network layers, such as pointwise non-linearities. For example, transformers can be more equivariant than convolutional neural networks after training.
Score: 84.29366874540217
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Equivariance guarantees that a model's predictions capture key symmetries in data. When an image is translated or rotated, an equivariant model's representation of that image will translate or rotate accordingly. The success of convolutional neural networks has historically been tied to translation equivariance directly encoded in their architecture. The rising success of vision transformers, which have no explicit architectural bias towards equivariance, challenges this narrative and suggests that augmentations and training data might also play a significant role in their performance. In order to better understand the role of equivariance in recent vision models, we introduce the Lie derivative, a method for measuring equivariance with strong mathematical foundations and minimal hyperparameters. Using the Lie derivative, we study the equivariance properties of hundreds of pretrained models, spanning CNNs, transformers, and Mixer architectures. The scale of our analysis allows us to separate the impact of architecture from other factors like model size or training method. Surprisingly, we find that many violations of equivariance can be linked to spatial aliasing in ubiquitous network layers, such as pointwise non-linearities, and that as models get larger and more accurate they tend to display more equivariance, regardless of architecture. For example, transformers can be more equivariant than convolutional neural networks after training.

Related papers

Relaxed Equivariance via Multitask Learning [7.905957228045955]
We introduce REMUL, a training procedure for approximating equivariance with multitask learning. We show that unconstrained models can learn approximate symmetries by minimizing an additional simple equivariance loss. Our method achieves competitive performance compared to equivariant baselines while being $10 times$ faster at inference and $2.5 times$ at training.
arXiv Detail & Related papers (2024-10-23T13:50:27Z)
What Affects Learned Equivariance in Deep Image Recognition Models? [10.590129221143222]
We find evidence for a correlation between learned translation equivariance and validation accuracy on ImageNet. Data augmentation, reduced model capacity and inductive bias in the form of convolutions induce higher learned equivariance in neural networks.
arXiv Detail & Related papers (2023-04-05T17:54:25Z)
On the Strong Correlation Between Model Invariance and Generalization [54.812786542023325]
Generalization captures a model's ability to classify unseen data. Invariance measures consistency of model predictions on transformations of the data. From a dataset-centric view, we find a certain model's accuracy and invariance linearly correlated on different test sets.
arXiv Detail & Related papers (2022-07-14T17:08:25Z)
Equivariance Discovery by Learned Parameter-Sharing [153.41877129746223]
We study how to discover interpretable equivariances from data. Specifically, we formulate this discovery process as an optimization problem over a model's parameter-sharing schemes. Also, we theoretically analyze the method for Gaussian data and provide a bound on the mean squared gap between the studied discovery scheme and the oracle scheme.
arXiv Detail & Related papers (2022-04-07T17:59:19Z)
Quantised Transforming Auto-Encoders: Achieving Equivariance to Arbitrary Transformations in Deep Networks [23.673155102696338]
Convolutional Neural Networks (CNNs) are equivariant to image translation. We propose an auto-encoder architecture whose embedding obeys an arbitrary set of equivariance relations simultaneously. We demonstrate results of successful re-rendering of transformed versions of input images on several datasets.
arXiv Detail & Related papers (2021-11-25T02:26:38Z)
Grounding inductive biases in natural images:invariance stems from variations in data [20.432568247732206]
We study the factors of variation in a real dataset, ImageNet. We show standard augmentation relies on a precise combination of translation and scale. We find that the main factors of variation in ImageNet mostly relate to appearance.
arXiv Detail & Related papers (2021-06-09T14:58:57Z)
Learning Invariances in Neural Networks [51.20867785006147]
We show how to parameterize a distribution over augmentations and optimize the training loss simultaneously with respect to the network parameters and augmentation parameters. We can recover the correct set and extent of invariances on image classification, regression, segmentation, and molecular property prediction from a large space of augmentations.
arXiv Detail & Related papers (2020-10-22T17:18:48Z)
Category-Learning with Context-Augmented Autoencoder [63.05016513788047]
Finding an interpretable non-redundant representation of real-world data is one of the key problems in Machine Learning. We propose a novel method of using data augmentations when training autoencoders. We train a Variational Autoencoder in such a way, that it makes transformation outcome predictable by auxiliary network.
arXiv Detail & Related papers (2020-10-10T14:04:44Z)
Generalizing Convolutional Neural Networks for Equivariance to Lie Groups on Arbitrary Continuous Data [52.78581260260455]
We propose a general method to construct a convolutional layer that is equivariant to transformations from any specified Lie group. We apply the same model architecture to images, ball-and-stick molecular data, and Hamiltonian dynamical systems.
arXiv Detail & Related papers (2020-02-25T17:40:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.