Related papers: On permutation symmetries in Bayesian neural network posteriors: a variational perspective

On permutation symmetries in Bayesian neural network posteriors: a variational perspective

URL: http://arxiv.org/abs/2310.10171v1
Date: Mon, 16 Oct 2023 08:26:50 GMT
Title: On permutation symmetries in Bayesian neural network posteriors: a variational perspective
Authors: Simone Rossi, Ankit Singh, Thomas Hannagan
Abstract summary: We show that there is essentially no loss barrier between the local solutions of gradient descent. This raises questions for approximate inference in Bayesian neural networks. We propose a matching algorithm to search for linearly connected solutions.
Score: 8.310462710943971
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The elusive nature of gradient-based optimization in neural networks is tied to their loss landscape geometry, which is poorly understood. However recent work has brought solid evidence that there is essentially no loss barrier between the local solutions of gradient descent, once accounting for weight-permutations that leave the network's computation unchanged. This raises questions for approximate inference in Bayesian neural networks (BNNs), where we are interested in marginalizing over multiple points in the loss landscape. In this work, we first extend the formalism of marginalized loss barrier and solution interpolation to BNNs, before proposing a matching algorithm to search for linearly connected solutions. This is achieved by aligning the distributions of two independent approximate Bayesian solutions with respect to permutation matrices. We build on the results of Ainsworth et al. (2023), reframing the problem as a combinatorial optimization one, using an approximation to the sum of bilinear assignment problem. We then experiment on a variety of architectures and datasets, finding nearly zero marginalized loss barriers for linearly connected solutions.

Related papers

Exploring the loss landscape of regularized neural networks via convex duality [42.48510370193192]
We discuss several aspects of the loss landscape of regularized neural networks. We first characterize the solution set of the convex problem using its dual and further characterize all stationary points. We show that the solution set characterization and connectivity results can be extended to different architectures.
arXiv Detail & Related papers (2024-11-12T11:41:38Z)
GLinSAT: The General Linear Satisfiability Neural Network Layer By Accelerated Gradient Descent [12.409030267572243]
We first reformulate the neural network output projection problem as an entropy-regularized linear programming problem. Based on an accelerated gradient descent algorithm with numerical performance enhancement, we present our architecture, GLinSAT, to solve the problem. This is the first general linear satisfiability layer in which all the operations are differentiable and matrix-factorization-free.
arXiv Detail & Related papers (2024-09-26T03:12:53Z)
LinSATNet: The Positive Linear Satisfiability Neural Networks [116.65291739666303]
This paper studies how to introduce the popular positive linear satisfiability to neural networks. We propose the first differentiable satisfiability layer based on an extension of the classic Sinkhorn algorithm for jointly encoding multiple sets of marginal distributions.
arXiv Detail & Related papers (2024-07-18T22:05:21Z)
Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training. We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z)
Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks [83.58049517083138]
We consider a two-layer ReLU network trained via gradient descent. We show that SGD is biased towards a simple solution. We also provide empirical evidence that knots at locations distinct from the data points might occur.
arXiv Detail & Related papers (2021-11-03T15:14:20Z)
Learning through atypical ''phase transitions'' in overparameterized neural networks [0.43496401697112685]
Current deep neural networks are highly observableized (up to billions of connection weights) and nonlinear. Yet they can fit data almost perfectly through overdense descent algorithms and achieve unexpected accuracy prediction. These are formidable challenges without generalization.
arXiv Detail & Related papers (2021-10-01T23:28:07Z)
Optimizing Mode Connectivity via Neuron Alignment [84.26606622400423]
Empirically, the local minima of loss functions can be connected by a learned curve in model space along which the loss remains nearly constant. We propose a more general framework to investigate effect of symmetry on landscape connectivity by accounting for the weight permutations of networks being connected.
arXiv Detail & Related papers (2020-09-05T02:25:23Z)
Eigendecomposition-Free Training of Deep Networks for Linear Least-Square Problems [107.3868459697569]
We introduce an eigendecomposition-free approach to training a deep network. We show that our approach is much more robust than explicit differentiation of the eigendecomposition. Our method has better convergence properties and yields state-of-the-art results.
arXiv Detail & Related papers (2020-04-15T04:29:34Z)
PFNN: A Penalty-Free Neural Network Method for Solving a Class of Second-Order Boundary-Value Problems on Complex Geometries [4.620110353542715]
We present PFNN, a penalty-free neural network method, to solve a class of second-order boundary-value problems. PFNN is superior to several existing approaches in terms of accuracy, flexibility and robustness.
arXiv Detail & Related papers (2020-04-14T13:36:14Z)
Convex Geometry and Duality of Over-parameterized Neural Networks [70.15611146583068]
We develop a convex analytic approach to analyze finite width two-layer ReLU networks. We show that an optimal solution to the regularized training problem can be characterized as extreme points of a convex set. In higher dimensions, we show that the training problem can be cast as a finite dimensional convex problem with infinitely many constraints.
arXiv Detail & Related papers (2020-02-25T23:05:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.