Optimizing Mode Connectivity via Neuron Alignment
- URL: http://arxiv.org/abs/2009.02439v2
- Date: Mon, 2 Nov 2020 23:56:40 GMT
- Title: Optimizing Mode Connectivity via Neuron Alignment
- Authors: N. Joseph Tatro, Pin-Yu Chen, Payel Das, Igor Melnyk, Prasanna
Sattigeri, Rongjie Lai
- Abstract summary: Empirically, the local minima of loss functions can be connected by a learned curve in model space along which the loss remains nearly constant.
We propose a more general framework to investigate effect of symmetry on landscape connectivity by accounting for the weight permutations of networks being connected.
- Score: 84.26606622400423
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The loss landscapes of deep neural networks are not well understood due to
their high nonconvexity. Empirically, the local minima of these loss functions
can be connected by a learned curve in model space, along which the loss
remains nearly constant; a feature known as mode connectivity. Yet, current
curve finding algorithms do not consider the influence of symmetry in the loss
surface created by model weight permutations. We propose a more general
framework to investigate the effect of symmetry on landscape connectivity by
accounting for the weight permutations of the networks being connected. To
approximate the optimal permutation, we introduce an inexpensive heuristic
referred to as neuron alignment. Neuron alignment promotes similarity between
the distribution of intermediate activations of models along the curve. We
provide theoretical analysis establishing the benefit of alignment to mode
connectivity based on this simple heuristic. We empirically verify that the
permutation given by alignment is locally optimal via a proximal alternating
minimization scheme. Empirically, optimizing the weight permutation is critical
for efficiently learning a simple, planar, low-loss curve between networks that
successfully generalizes. Our alignment method can significantly alleviate the
recently identified robust loss barrier on the path connecting two adversarial
robust models and find more robust and accurate models on the path.
Related papers
- Improving Generalization of Deep Neural Networks by Optimum Shifting [33.092571599896814]
We propose a novel method called emphoptimum shifting, which changes the parameters of a neural network from a sharp minimum to a flatter one.
Our method is based on the observation that when the input and output of a neural network are fixed, the matrix multiplications within the network can be treated as systems of under-determined linear equations.
arXiv Detail & Related papers (2024-05-23T02:31:55Z) - Hallmarks of Optimization Trajectories in Neural Networks: Directional Exploration and Redundancy [75.15685966213832]
We analyze the rich directional structure of optimization trajectories represented by their pointwise parameters.
We show that training only scalar batchnorm parameters some while into training matches the performance of training the entire network.
arXiv Detail & Related papers (2024-03-12T07:32:47Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - On the Effect of Initialization: The Scaling Path of 2-Layer Neural
Networks [21.69222364939501]
In supervised learning, the regularization path is sometimes used as a convenient theoretical proxy for the optimization path of gradient descent from zero.
We show that the path interpolates continuously between the so-called kernel and rich regimes.
arXiv Detail & Related papers (2023-03-31T05:32:11Z) - Semantic Strengthening of Neuro-Symbolic Learning [85.6195120593625]
Neuro-symbolic approaches typically resort to fuzzy approximations of a probabilistic objective.
We show how to compute this efficiently for tractable circuits.
We test our approach on three tasks: predicting a minimum-cost path in Warcraft, predicting a minimum-cost perfect matching, and solving Sudoku puzzles.
arXiv Detail & Related papers (2023-02-28T00:04:22Z) - Phenomenology of Double Descent in Finite-Width Neural Networks [29.119232922018732]
Double descent delineates the behaviour of models depending on the regime they belong to.
We use influence functions to derive suitable expressions of the population loss and its lower bound.
Building on our analysis, we investigate how the loss function affects double descent.
arXiv Detail & Related papers (2022-03-14T17:39:49Z) - Nonasymptotic theory for two-layer neural networks: Beyond the
bias-variance trade-off [10.182922771556742]
We present a nonasymptotic generalization theory for two-layer neural networks with ReLU activation function.
We show that overparametrized random feature models suffer from the curse of dimensionality and thus are suboptimal.
arXiv Detail & Related papers (2021-06-09T03:52:18Z) - Rethinking Skip Connection with Layer Normalization in Transformers and
ResNets [49.87919454950763]
Skip connection is a widely-used technique to improve the performance of deep neural networks.
In this work, we investigate how the scale factors in the effectiveness of the skip connection.
arXiv Detail & Related papers (2021-05-15T11:44:49Z) - On the Stability Properties and the Optimization Landscape of Training
Problems with Squared Loss for Neural Networks and General Nonlinear Conic
Approximation Schemes [0.0]
We study the optimization landscape and the stability properties of training problems with squared loss for neural networks and general nonlinear conic approximation schemes.
We prove that the same effects that are responsible for these instability properties are also the reason for the emergence of saddle points and spurious local minima.
arXiv Detail & Related papers (2020-11-06T11:34:59Z) - Neural Control Variates [71.42768823631918]
We show that a set of neural networks can face the challenge of finding a good approximation of the integrand.
We derive a theoretically optimal, variance-minimizing loss function, and propose an alternative, composite loss for stable online training in practice.
Specifically, we show that the learned light-field approximation is of sufficient quality for high-order bounces, allowing us to omit the error correction and thereby dramatically reduce the noise at the cost of negligible visible bias.
arXiv Detail & Related papers (2020-06-02T11:17:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.