Training Dynamics of Learning 3D-Rotational Equivariance
- URL: http://arxiv.org/abs/2512.02303v1
- Date: Tue, 02 Dec 2025 00:48:09 GMT
- Title: Training Dynamics of Learning 3D-Rotational Equivariance
- Authors: Max W. Shen, Ewa Nowara, Michael Maser, Kyunghyun Cho,
- Abstract summary: We investigate how quickly symmetry-agnostic models learn to respect symmetries.<n>For 3D rotations, the loss penalty for non-equivariant models is small throughout training.
- Score: 36.537155535159435
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While data augmentation is widely used to train symmetry-agnostic models, it remains unclear how quickly and effectively they learn to respect symmetries. We investigate this by deriving a principled measure of equivariance error that, for convex losses, calculates the percent of total loss attributable to imperfections in learned symmetry. We focus our empirical investigation to 3D-rotation equivariance on high-dimensional molecular tasks (flow matching, force field prediction, denoising voxels) and find that models reduce equivariance error quickly to $\leq$2\% held-out loss within 1k-10k training steps, a result robust to model and dataset size. This happens because learning 3D-rotational equivariance is an easier learning task, with a smoother and better-conditioned loss landscape, than the main prediction task. For 3D rotations, the loss penalty for non-equivariant models is small throughout training, so they may achieve lower test loss than equivariant models per GPU-hour unless the equivariant ``efficiency gap'' is narrowed. We also experimentally and theoretically investigate the relationships between relative equivariance error, learning gradients, and model parameters.
Related papers
- Tail-Aware Post-Training Quantization for 3D Geometry Models [58.79500829118265]
Post-Training Quantization (PTQ) enables efficient inference without retraining.<n>PTQ fails to transfer effectively to 3D models due to intricate feature distributions and prohibitive calibration overhead.<n>We propose TAPTQ, a Tail-Aware Post-Training Quantization pipeline for 3D geometric learning.
arXiv Detail & Related papers (2026-02-02T07:21:15Z) - Understanding Data Influence with Differential Approximation [63.817689230826595]
We introduce a new formulation to approximate a sample's influence by accumulating the differences in influence between consecutive learning steps, which we term Diff-In.<n>By employing second-order approximations, we approximate these difference terms with high accuracy while eliminating the need for model convexity required by existing methods.<n>Our theoretical analysis demonstrates that Diff-In achieves significantly lower approximation error compared to existing influence estimators.
arXiv Detail & Related papers (2025-08-20T11:59:32Z) - Do we need equivariant models for molecule generation? [2.336105667374686]
We investigate whether non-equivariant convolutional neural networks (CNNs) trained with rotation augmentations can learn equivariance and match the performance of equivariant models.<n>To our knowledge, this is the first study to analyze learned equivariance in generative tasks.
arXiv Detail & Related papers (2025-07-13T19:16:11Z) - Rao-Blackwell Gradient Estimators for Equivariant Denoising Diffusion [55.95767828747407]
In domains such as molecular and protein generation, physical systems exhibit inherent symmetries that are critical to model.<n>We present a framework that reduces training variance and provides a provably lower-variance gradient estimator.<n>We also present a practical implementation of this estimator incorporating the loss and sampling procedure through a method we call Orbit Diffusion.
arXiv Detail & Related papers (2025-02-14T03:26:57Z) - Does equivariance matter at scale? [15.247352029530523]
We study how equivariant and non-equivariant networks scale with compute and training samples.<n>First, equivariance improves data efficiency, but training non-equivariant models with data augmentation can close this gap given sufficient epochs.<n>Second, scaling with compute follows a power law, with equivariant models outperforming non-equivariant ones at each tested compute budget.
arXiv Detail & Related papers (2024-10-30T16:36:59Z) - Relaxed Equivariance via Multitask Learning [7.905957228045955]
We introduce REMUL, a training procedure for approximating equivariance with multitask learning.<n>We show that unconstrained models can learn approximate symmetries by minimizing an additional simple equivariance loss.<n>Our method achieves competitive performance compared to equivariant baselines while being $10 times$ faster at inference and $2.5 times$ at training.
arXiv Detail & Related papers (2024-10-23T13:50:27Z) - The Lie Derivative for Measuring Learned Equivariance [84.29366874540217]
We study the equivariance properties of hundreds of pretrained models, spanning CNNs, transformers, and Mixer architectures.
We find that many violations of equivariance can be linked to spatial aliasing in ubiquitous network layers, such as pointwise non-linearities.
For example, transformers can be more equivariant than convolutional neural networks after training.
arXiv Detail & Related papers (2022-10-06T15:20:55Z) - The KFIoU Loss for Rotated Object Detection [115.334070064346]
In this paper, we argue that one effective alternative is to devise an approximate loss who can achieve trend-level alignment with SkewIoU loss.
Specifically, we model the objects as Gaussian distribution and adopt Kalman filter to inherently mimic the mechanism of SkewIoU.
The resulting new loss called KFIoU is easier to implement and works better compared with exact SkewIoU.
arXiv Detail & Related papers (2022-01-29T10:54:57Z) - Memorizing without overfitting: Bias, variance, and interpolation in
over-parameterized models [0.0]
The bias-variance trade-off is a central concept in supervised learning.
Modern Deep Learning methods flout this dogma, achieving state-of-the-art performance.
arXiv Detail & Related papers (2020-10-26T22:31:04Z) - What causes the test error? Going beyond bias-variance via ANOVA [21.359033212191218]
Modern machine learning methods are often overparametrized, allowing adaptation to the data at a fine level.
Recent work aimed to understand in greater depth why overparametrization is helpful for generalization.
We propose using the analysis of variance (ANOVA) to decompose the variance in the test error in a symmetric way.
arXiv Detail & Related papers (2020-10-11T05:21:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.