On permutation symmetries in Bayesian neural network posteriors: a
variational perspective
- URL: http://arxiv.org/abs/2310.10171v1
- Date: Mon, 16 Oct 2023 08:26:50 GMT
- Title: On permutation symmetries in Bayesian neural network posteriors: a
variational perspective
- Authors: Simone Rossi, Ankit Singh, Thomas Hannagan
- Abstract summary: We show that there is essentially no loss barrier between the local solutions of gradient descent.
This raises questions for approximate inference in Bayesian neural networks.
We propose a matching algorithm to search for linearly connected solutions.
- Score: 8.310462710943971
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The elusive nature of gradient-based optimization in neural networks is tied
to their loss landscape geometry, which is poorly understood. However recent
work has brought solid evidence that there is essentially no loss barrier
between the local solutions of gradient descent, once accounting for
weight-permutations that leave the network's computation unchanged. This raises
questions for approximate inference in Bayesian neural networks (BNNs), where
we are interested in marginalizing over multiple points in the loss landscape.
In this work, we first extend the formalism of marginalized loss barrier and
solution interpolation to BNNs, before proposing a matching algorithm to search
for linearly connected solutions. This is achieved by aligning the
distributions of two independent approximate Bayesian solutions with respect to
permutation matrices. We build on the results of Ainsworth et al. (2023),
reframing the problem as a combinatorial optimization one, using an
approximation to the sum of bilinear assignment problem. We then experiment on
a variety of architectures and datasets, finding nearly zero marginalized loss
barriers for linearly connected solutions.
Related papers
- Exploring the loss landscape of regularized neural networks via convex duality [42.48510370193192]
We discuss several aspects of the loss landscape of regularized neural networks.
We first characterize the solution set of the convex problem using its dual and further characterize all stationary points.
We show that the solution set characterization and connectivity results can be extended to different architectures.
arXiv Detail & Related papers (2024-11-12T11:41:38Z) - GLinSAT: The General Linear Satisfiability Neural Network Layer By Accelerated Gradient Descent [12.409030267572243]
We first reformulate the neural network output projection problem as an entropy-regularized linear programming problem.
Based on an accelerated gradient descent algorithm with numerical performance enhancement, we present our architecture, GLinSAT, to solve the problem.
This is the first general linear satisfiability layer in which all the operations are differentiable and matrix-factorization-free.
arXiv Detail & Related papers (2024-09-26T03:12:53Z) - LinSATNet: The Positive Linear Satisfiability Neural Networks [116.65291739666303]
This paper studies how to introduce the popular positive linear satisfiability to neural networks.
We propose the first differentiable satisfiability layer based on an extension of the classic Sinkhorn algorithm for jointly encoding multiple sets of marginal distributions.
arXiv Detail & Related papers (2024-07-18T22:05:21Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks [83.58049517083138]
We consider a two-layer ReLU network trained via gradient descent.
We show that SGD is biased towards a simple solution.
We also provide empirical evidence that knots at locations distinct from the data points might occur.
arXiv Detail & Related papers (2021-11-03T15:14:20Z) - Learning through atypical ''phase transitions'' in overparameterized
neural networks [0.43496401697112685]
Current deep neural networks are highly observableized (up to billions of connection weights) and nonlinear.
Yet they can fit data almost perfectly through overdense descent algorithms and achieve unexpected accuracy prediction.
These are formidable challenges without generalization.
arXiv Detail & Related papers (2021-10-01T23:28:07Z) - Optimizing Mode Connectivity via Neuron Alignment [84.26606622400423]
Empirically, the local minima of loss functions can be connected by a learned curve in model space along which the loss remains nearly constant.
We propose a more general framework to investigate effect of symmetry on landscape connectivity by accounting for the weight permutations of networks being connected.
arXiv Detail & Related papers (2020-09-05T02:25:23Z) - Eigendecomposition-Free Training of Deep Networks for Linear
Least-Square Problems [107.3868459697569]
We introduce an eigendecomposition-free approach to training a deep network.
We show that our approach is much more robust than explicit differentiation of the eigendecomposition.
Our method has better convergence properties and yields state-of-the-art results.
arXiv Detail & Related papers (2020-04-15T04:29:34Z) - PFNN: A Penalty-Free Neural Network Method for Solving a Class of
Second-Order Boundary-Value Problems on Complex Geometries [4.620110353542715]
We present PFNN, a penalty-free neural network method, to solve a class of second-order boundary-value problems.
PFNN is superior to several existing approaches in terms of accuracy, flexibility and robustness.
arXiv Detail & Related papers (2020-04-14T13:36:14Z) - Convex Geometry and Duality of Over-parameterized Neural Networks [70.15611146583068]
We develop a convex analytic approach to analyze finite width two-layer ReLU networks.
We show that an optimal solution to the regularized training problem can be characterized as extreme points of a convex set.
In higher dimensions, we show that the training problem can be cast as a finite dimensional convex problem with infinitely many constraints.
arXiv Detail & Related papers (2020-02-25T23:05:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.