Does equivariance matter at scale?
- URL: http://arxiv.org/abs/2410.23179v1
- Date: Wed, 30 Oct 2024 16:36:59 GMT
- Title: Does equivariance matter at scale?
- Authors: Johann Brehmer, Sönke Behrends, Pim de Haan, Taco Cohen,
- Abstract summary: We study how equivariant and non-equivariant networks scale with compute and training samples.
First, equivariance improves data efficiency, but training non-equivariant models with data augmentation can close this gap given sufficient epochs.
Second, scaling with compute follows a power law, with equivariant models outperforming non-equivariant ones at each tested compute budget.
- Score: 15.247352029530523
- License:
- Abstract: Given large data sets and sufficient compute, is it beneficial to design neural architectures for the structure and symmetries of each problem? Or is it more efficient to learn them from data? We study empirically how equivariant and non-equivariant networks scale with compute and training samples. Focusing on a benchmark problem of rigid-body interactions and on general-purpose transformer architectures, we perform a series of experiments, varying the model size, training steps, and dataset size. We find evidence for three conclusions. First, equivariance improves data efficiency, but training non-equivariant models with data augmentation can close this gap given sufficient epochs. Second, scaling with compute follows a power law, with equivariant models outperforming non-equivariant ones at each tested compute budget. Finally, the optimal allocation of a compute budget onto model size and training duration differs between equivariant and non-equivariant models.
Related papers
- On the Utility of Equivariance and Symmetry Breaking in Deep Learning Architectures on Point Clouds [1.4079337353605066]
This paper explores the key factors that influence the performance of models working with point clouds.
We identify the key aspects of equivariant and non-equivariant architectures that drive success in different tasks.
arXiv Detail & Related papers (2025-01-01T07:00:41Z) - Relaxed Equivariance via Multitask Learning [7.905957228045955]
We introduce REMUL, a training procedure for approximating equivariance with multitask learning.
We show that unconstrained models can learn approximate symmetries by minimizing an additional simple equivariance loss.
Our method achieves competitive performance compared to equivariant baselines while being $10 times$ faster at inference and $2.5 times$ at training.
arXiv Detail & Related papers (2024-10-23T13:50:27Z) - OmniBal: Towards Fast Instruct-tuning for Vision-Language Models via Omniverse Computation Balance [65.48009829137824]
Large-scale 3D parallel training on vision-language instruct-tuning models leads to an imbalanced computation load across different devices.
We rebalanced the computational loads from data, model, and memory perspectives to address this issue.
Our method's efficacy and generalizability were further demonstrated across various models and datasets.
arXiv Detail & Related papers (2024-07-30T12:02:58Z) - Approximately Equivariant Neural Processes [47.14384085714576]
When modelling real-world data, learning problems are often not exactly equivariant, but only approximately.
Current approaches to achieving this cannot usually be applied out-of-the-box to any architecture and symmetry group.
We develop a general approach to achieving this using existing equivariant architectures.
arXiv Detail & Related papers (2024-06-19T12:17:14Z) - What Affects Learned Equivariance in Deep Image Recognition Models? [10.590129221143222]
We find evidence for a correlation between learned translation equivariance and validation accuracy on ImageNet.
Data augmentation, reduced model capacity and inductive bias in the form of convolutions induce higher learned equivariance in neural networks.
arXiv Detail & Related papers (2023-04-05T17:54:25Z) - The Lie Derivative for Measuring Learned Equivariance [84.29366874540217]
We study the equivariance properties of hundreds of pretrained models, spanning CNNs, transformers, and Mixer architectures.
We find that many violations of equivariance can be linked to spatial aliasing in ubiquitous network layers, such as pointwise non-linearities.
For example, transformers can be more equivariant than convolutional neural networks after training.
arXiv Detail & Related papers (2022-10-06T15:20:55Z) - On the Strong Correlation Between Model Invariance and Generalization [54.812786542023325]
Generalization captures a model's ability to classify unseen data.
Invariance measures consistency of model predictions on transformations of the data.
From a dataset-centric view, we find a certain model's accuracy and invariance linearly correlated on different test sets.
arXiv Detail & Related papers (2022-07-14T17:08:25Z) - Equivariance Discovery by Learned Parameter-Sharing [153.41877129746223]
We study how to discover interpretable equivariances from data.
Specifically, we formulate this discovery process as an optimization problem over a model's parameter-sharing schemes.
Also, we theoretically analyze the method for Gaussian data and provide a bound on the mean squared gap between the studied discovery scheme and the oracle scheme.
arXiv Detail & Related papers (2022-04-07T17:59:19Z) - Equivariant vector field network for many-body system modeling [65.22203086172019]
Equivariant Vector Field Network (EVFN) is built on a novel equivariant basis and the associated scalarization and vectorization layers.
We evaluate our method on predicting trajectories of simulated Newton mechanics systems with both full and partially observed data.
arXiv Detail & Related papers (2021-10-26T14:26:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.