Related papers: A Tale of Two Symmetries: Exploring the Loss Landscape of Equivariant Models

A Tale of Two Symmetries: Exploring the Loss Landscape of Equivariant Models

URL: http://arxiv.org/abs/2506.02269v1
Date: Mon, 02 Jun 2025 21:15:36 GMT
Title: A Tale of Two Symmetries: Exploring the Loss Landscape of Equivariant Models
Authors: YuQing Xie, Tess Smidt,
Abstract summary: We show that the parameter symmetries of the unconstrained model has non effects on the loss landscape of the equivariant subspace.<n>Surprisingly, the weights eventually found via relaxation corresponds to a different choice of group representation in the hidden layer.
Score: 0.6475999521931204
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Equivariant neural networks have proven to be effective for tasks with known underlying symmetries. However, optimizing equivariant networks can be tricky and best training practices are less established than for standard networks. In particular, recent works have found small training benefits from relaxing equivariance constraints. This raises the question: do equivariance constraints introduce fundamental obstacles to optimization? Or do they simply require different hyperparameter tuning? In this work, we investigate this question through a theoretical analysis of the loss landscape geometry. We focus on networks built using permutation representations, which we can view as a subset of unconstrained MLPs. Importantly, we show that the parameter symmetries of the unconstrained model has nontrivial effects on the loss landscape of the equivariant subspace and under certain conditions can provably prevent learning of the global minima. Further, we empirically demonstrate in such cases, relaxing to an unconstrained MLP can sometimes solve the issue. Interestingly, the weights eventually found via relaxation corresponds to a different choice of group representation in the hidden layer. From this, we draw 3 key takeaways. (1) Viewing any class of networks in the context of larger unconstrained function space can give important insights on loss landscape structure. (2) Within the unconstrained function space, equivariant networks form a complicated union of linear hyperplanes, each associated with a specific choice of internal group representation. (3) Effective relaxation of equivariance may require not only adding nonequivariant degrees of freedom, but also rethinking the fixed choice of group representations in hidden layers.

Related papers

Parameter-free approximate equivariance for tasks with finite group symmetry [15.964726158869777]
Equivariant neural networks incorporate symmetries through group actions, embedding them as an inductive bias to improve performance on a wide variety of tasks.<n>We propose a simple zero- parameter approach that imposes approximate equivariance for a finite group in the latent representation, as an additional term in the loss function.<n>We benchmark our approach on three datasets and compare it against several existing equivariant methods, showing that it achieves similar or better performance for a fraction of the parameters.
arXiv Detail & Related papers (2025-06-09T21:23:26Z)
Revisiting Multi-Permutation Equivariance through the Lens of Irreducible Representations [3.0222726571099665]
We show that non-Siamese layers can improve performance in tasks like graph anomaly detection, weight space alignment, and learning Wasserstein distances.<n>We also show empirically that these additional non-Siamese layers can improve performance in tasks like graph anomaly detection, weight space alignment, and learning Wasserstein distances.
arXiv Detail & Related papers (2024-10-09T08:19:31Z)
Improving Equivariant Model Training via Constraint Relaxation [31.507956579770088]
We propose a novel framework for improving the optimization of such models by relaxing the hard equivariance constraint during training.<n>We provide experimental results on different state-of-the-art network architectures, demonstrating how this training framework can result in equivariant models with improved generalization performance.
arXiv Detail & Related papers (2024-08-23T17:35:08Z)
Just How Flexible are Neural Networks in Practice? [89.80474583606242]
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters. In practice, however, we only find solutions via our training procedure, including the gradient and regularizers, limiting flexibility.
arXiv Detail & Related papers (2024-06-17T12:24:45Z)
Symmetry Breaking and Equivariant Neural Networks [17.740760773905986]
We introduce a novel notion of'relaxed equiinjection' We show how to incorporate this relaxation into equivariant multilayer perceptronrons (E-MLPs) The relevance of symmetry breaking is then discussed in various application domains.
arXiv Detail & Related papers (2023-12-14T15:06:48Z)
Learning Layer-wise Equivariances Automatically using Gradients [66.81218780702125]
Convolutions encode equivariance symmetries into neural networks leading to better generalisation performance. symmetries provide fixed hard constraints on the functions a network can represent, need to be specified in advance, and can not be adapted. Our goal is to allow flexible symmetry constraints that can automatically be learned from data using gradients.
arXiv Detail & Related papers (2023-10-09T20:22:43Z)
The Implicit Bias of Minima Stability in Multivariate Shallow ReLU Networks [53.95175206863992]
We study the type of solutions to which gradient descent converges when used to train a single hidden-layer multivariate ReLU network with the quadratic loss. We prove that although shallow ReLU networks are universal approximators, stable shallow networks are not.
arXiv Detail & Related papers (2023-06-30T09:17:39Z)
Generalizing Neural Human Fitting to Unseen Poses With Articulated SE(3) Equivariance [48.39751410262664]
ArtEq is a part-based SE(3)-equivariant neural architecture for SMPL model estimation from point clouds. Experimental results show that ArtEq generalizes to poses not seen during training, outperforming state-of-the-art methods by 44% in terms of body reconstruction accuracy.
arXiv Detail & Related papers (2023-04-20T17:58:26Z)
Relaxing Equivariance Constraints with Non-stationary Continuous Filters [20.74154804898478]
The proposed parameterization can be thought of as a building block to allow adjustable symmetry structure in neural networks. Compared to non-equivariant or strict-equivariant baselines, we experimentally verify that soft equivariance leads to improved performance in terms of test accuracy on CIFAR-10 and CIFAR-100 image classification tasks.
arXiv Detail & Related papers (2022-04-14T18:08:36Z)
Fundamental Limits and Tradeoffs in Invariant Representation Learning [99.2368462915979]
Many machine learning applications involve learning representations that achieve two competing goals. Minimax game-theoretic formulation represents a fundamental tradeoff between accuracy and invariance. We provide an information-theoretic analysis of this general and important problem under both classification and regression settings.
arXiv Detail & Related papers (2020-12-19T15:24:04Z)
Learning Invariances in Neural Networks [51.20867785006147]
We show how to parameterize a distribution over augmentations and optimize the training loss simultaneously with respect to the network parameters and augmentation parameters. We can recover the correct set and extent of invariances on image classification, regression, segmentation, and molecular property prediction from a large space of augmentations.
arXiv Detail & Related papers (2020-10-22T17:18:48Z)
MDP Homomorphic Networks: Group Symmetries in Reinforcement Learning [90.20563679417567]
This paper introduces MDP homomorphic networks for deep reinforcement learning. MDP homomorphic networks are neural networks that are equivariant under symmetries in the joint state-action space of an MDP. We show that such networks converge faster than unstructured networks on CartPole, a grid world and Pong.
arXiv Detail & Related papers (2020-06-30T15:38:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.