Related papers: Variational Inference Failures Under Model Symmetries: Permutation Invariant Posteriors for Bayesian Neural Networks

Variational Inference Failures Under Model Symmetries: Permutation Invariant Posteriors for Bayesian Neural Networks

URL: http://arxiv.org/abs/2408.05496v1
Date: Sat, 10 Aug 2024 09:06:34 GMT
Title: Variational Inference Failures Under Model Symmetries: Permutation Invariant Posteriors for Bayesian Neural Networks
Authors: Yoav Gelberg, Tycho F. A. van der Ouderaa, Mark van der Wilk, Yarin Gal,
Abstract summary: We investigate the impact of weight space permutation symmetries on variational inference. We devise a symmetric symmetrization mechanism for constructing permutation invariant variational posteriors. We show that the symmetrized distribution has a strictly better fit to the true posterior, and that it can be trained using the original ELBO objective.
Score: 43.88179780450706
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Weight space symmetries in neural network architectures, such as permutation symmetries in MLPs, give rise to Bayesian neural network (BNN) posteriors with many equivalent modes. This multimodality poses a challenge for variational inference (VI) techniques, which typically rely on approximating the posterior with a unimodal distribution. In this work, we investigate the impact of weight space permutation symmetries on VI. We demonstrate, both theoretically and empirically, that these symmetries lead to biases in the approximate posterior, which degrade predictive performance and posterior fit if not explicitly accounted for. To mitigate this behavior, we leverage the symmetric structure of the posterior and devise a symmetrization mechanism for constructing permutation invariant variational posteriors. We show that the symmetrized distribution has a strictly better fit to the true posterior, and that it can be trained using the original ELBO objective with a modified KL regularization term. We demonstrate experimentally that our approach mitigates the aforementioned biases and results in improved predictions and a higher ELBO.

Related papers

Generalized Linear Mode Connectivity for Transformers [87.32299363530996]
A striking phenomenon is linear mode connectivity (LMC), where independently trained models can be connected by low- or zero-loss paths.<n>Prior work has predominantly focused on neuron re-ordering through permutations, but such approaches are limited in scope.<n>We introduce a unified framework that captures four symmetry classes: permutations, semi-permutations, transformations, and general invertible maps.<n>This generalization enables, for the first time, the discovery of low- and zero-barrier linear paths between independently trained Vision Transformers and GPT-2 models.
arXiv Detail & Related papers (2025-06-28T01:46:36Z)
Improving Equivariant Networks with Probabilistic Symmetry Breaking [9.164167226137664]
Equivariant networks encode known symmetries into neural networks, often enhancing generalizations. This poses an important problem, both (1) for prediction tasks on domains where self-symmetries are common, and (2) for generative models, which must break symmetries in order to reconstruct from highly symmetric latent spaces. We present novel theoretical results that establish sufficient conditions for representing such distributions.
arXiv Detail & Related papers (2025-03-27T21:04:49Z)
When GNNs meet symmetry in ILPs: an orbit-based feature augmentation approach [32.718944771397666]
A common characteristic in integer linear programs (ILPs) is symmetry, allowing variables to be permuted without altering the underlying problem structure. Classic GNN architectures struggle to differentiate between symmetric variables, which limits their predictive accuracy. We develop an orbit-based augmentation scheme that first groups symmetric variables and then samples augmented features for each group from a discrete uniform distribution.
arXiv Detail & Related papers (2025-01-24T03:33:33Z)
Approximate Equivariance in Reinforcement Learning [35.04248486334824]
Equivariant neural networks have shown great success in reinforcement learning. In many problems, only approximate symmetry is present, which makes imposing exact symmetry inappropriate. We develop approximately equivariant algorithms in reinforcement learning.
arXiv Detail & Related papers (2024-11-06T19:44:46Z)
Relative Representations: Topological and Geometric Perspectives [53.88896255693922]
Relative representations are an established approach to zero-shot model stitching. We introduce a normalization procedure in the relative transformation, resulting in invariance to non-isotropic rescalings and permutations. Second, we propose to deploy topological densification when fine-tuning relative representations, a topological regularization loss encouraging clustering within classes.
arXiv Detail & Related papers (2024-09-17T08:09:22Z)
Reparameterization invariance in approximate Bayesian inference [32.88960624085645]
We develop a new geometric view of reparametrizations from which we explain the success of linearization. We demonstrate that these re parameterization invariance properties can be extended to the original neural network predictive.
arXiv Detail & Related papers (2024-06-05T14:49:15Z)
The Empirical Impact of Neural Parameter Symmetries, or Lack Thereof [50.49582712378289]
We investigate the impact of neural parameter symmetries by introducing new neural network architectures. We develop two methods, with some provable guarantees, of modifying standard neural networks to reduce parameter space symmetries. Our experiments reveal several interesting observations on the empirical impact of parameter symmetries.
arXiv Detail & Related papers (2024-05-30T16:32:31Z)
Enhancing lattice kinetic schemes for fluid dynamics with Lattice-Equivariant Neural Networks [79.16635054977068]
We present a new class of equivariant neural networks, dubbed Lattice-Equivariant Neural Networks (LENNs) Our approach develops within a recently introduced framework aimed at learning neural network-based surrogate models Lattice Boltzmann collision operators. Our work opens towards practical utilization of machine learning-augmented Lattice Boltzmann CFD in real-world simulations.
arXiv Detail & Related papers (2024-05-22T17:23:15Z)
Symmetry Breaking and Equivariant Neural Networks [17.740760773905986]
We introduce a novel notion of'relaxed equiinjection' We show how to incorporate this relaxation into equivariant multilayer perceptronrons (E-MLPs) The relevance of symmetry breaking is then discussed in various application domains.
arXiv Detail & Related papers (2023-12-14T15:06:48Z)
Learning Probabilistic Symmetrization for Architecture Agnostic Equivariance [16.49488981364657]
We present a novel framework to overcome the limitations of equivariant architectures in learning functions with group symmetries. We use an arbitrary base model such as anvariant or a transformer and symmetrize it to be equivariant to the given group. Empirical tests show competitive results against tailored equivariant architectures.
arXiv Detail & Related papers (2023-06-05T13:40:54Z)
Oracle-Preserving Latent Flows [58.720142291102135]
We develop a methodology for the simultaneous discovery of multiple nontrivial continuous symmetries across an entire labelled dataset. The symmetry transformations and the corresponding generators are modeled with fully connected neural networks trained with a specially constructed loss function. The two new elements in this work are the use of a reduced-dimensionality latent space and the generalization to transformations invariant with respect to high-dimensional oracles.
arXiv Detail & Related papers (2023-02-02T00:13:32Z)
Equivariant neural networks for inverse problems [1.7942265700058986]
We show that group equivariant convolutional operations can naturally be incorporated into learned reconstruction methods. We design learned iterative methods in which the proximal operators are modelled as group equivariant convolutional neural networks.
arXiv Detail & Related papers (2021-02-23T05:38:41Z)
Learning Invariances in Neural Networks [51.20867785006147]
We show how to parameterize a distribution over augmentations and optimize the training loss simultaneously with respect to the network parameters and augmentation parameters. We can recover the correct set and extent of invariances on image classification, regression, segmentation, and molecular property prediction from a large space of augmentations.
arXiv Detail & Related papers (2020-10-22T17:18:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.