What can linear interpolation of neural network loss landscapes tell us?
- URL: http://arxiv.org/abs/2106.16004v1
- Date: Wed, 30 Jun 2021 11:54:04 GMT
- Title: What can linear interpolation of neural network loss landscapes tell us?
- Authors: Tiffany Vlaar and Jonathan Frankle
- Abstract summary: Loss landscapes are notoriously difficult to visualize in a human-comprehensible fashion.
One common way to address this problem is to plot linear slices of the landscape.
- Score: 11.753360538833139
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Studying neural network loss landscapes provides insights into the nature of
the underlying optimization problems. Unfortunately, loss landscapes are
notoriously difficult to visualize in a human-comprehensible fashion. One
common way to address this problem is to plot linear slices of the landscape,
for example from the initial state of the network to the final state after
optimization. On the basis of this analysis, prior work has drawn broader
conclusions about the difficulty of the optimization problem. In this paper, we
put inferences of this kind to the test, systematically evaluating how linear
interpolation and final performance vary when altering the data, choice of
initialization, and other optimizer and architecture design choices. Further,
we use linear interpolation to study the role played by individual layers and
substructures of the network. We find that certain layers are more sensitive to
the choice of initialization and optimizer hyperparameter settings, and we
exploit these observations to design custom optimization schemes. However, our
results cast doubt on the broader intuition that the presence or absence of
barriers when interpolating necessarily relates to the success of optimization.
Related papers
- Large-scale global optimization of ultra-high dimensional non-convex
landscapes based on generative neural networks [0.0]
We present an algorithm manage ultra-high dimensional optimization.
based on a deep generative network.
We show that our method performs better with fewer function evaluations compared to state-of-the-art algorithm.
arXiv Detail & Related papers (2023-07-09T00:05:59Z) - No Wrong Turns: The Simple Geometry Of Neural Networks Optimization
Paths [12.068608358926317]
First-order optimization algorithms are known to efficiently locate favorable minima in deep neural networks.
We focus on the fundamental geometric properties of sampled quantities of optimization on two key paths.
Our findings suggest that not only do optimization trajectories never encounter significant obstacles, but they also maintain stable dynamics during the majority of training.
arXiv Detail & Related papers (2023-06-20T22:10:40Z) - Backpropagation of Unrolled Solvers with Folded Optimization [55.04219793298687]
The integration of constrained optimization models as components in deep networks has led to promising advances on many specialized learning tasks.
One typical strategy is algorithm unrolling, which relies on automatic differentiation through the operations of an iterative solver.
This paper provides theoretical insights into the backward pass of unrolled optimization, leading to a system for generating efficiently solvable analytical models of backpropagation.
arXiv Detail & Related papers (2023-01-28T01:50:42Z) - Path Regularization: A Convexity and Sparsity Inducing Regularization
for Parallel ReLU Networks [75.33431791218302]
We study the training problem of deep neural networks and introduce an analytic approach to unveil hidden convexity in the optimization landscape.
We consider a deep parallel ReLU network architecture, which also includes standard deep networks and ResNets as its special cases.
arXiv Detail & Related papers (2021-10-18T18:00:36Z) - Non-Gradient Manifold Neural Network [79.44066256794187]
Deep neural network (DNN) generally takes thousands of iterations to optimize via gradient descent.
We propose a novel manifold neural network based on non-gradient optimization.
arXiv Detail & Related papers (2021-06-15T06:39:13Z) - Combinatorial Optimization for Panoptic Segmentation: An End-to-End
Trainable Approach [23.281726932718232]
We propose an end-to-end trainable architecture for simultaneous semantic and instance segmentation.
Our approach shows the utility of using optimization in tandem with deep learning in a challenging large scale real-world problem.
arXiv Detail & Related papers (2021-06-06T17:39:13Z) - Visualizing High-Dimensional Trajectories on the Loss-Landscape of ANNs [15.689418447376587]
Training artificial neural networks requires the optimization of highly non-dimensional loss functions.
Visualization tools have played a key role in uncovering key geometric characteristics of loss-landscape of ANNs.
We propose the modernity reduction method which represents the SOTA in terms both local and global structures.
arXiv Detail & Related papers (2021-01-31T16:30:50Z) - Efficient and Sparse Neural Networks by Pruning Weights in a
Multiobjective Learning Approach [0.0]
We propose a multiobjective perspective on the training of neural networks by treating its prediction accuracy and the network complexity as two individual objective functions.
Preliminary numerical results on exemplary convolutional neural networks confirm that large reductions in the complexity of neural networks with neglibile loss of accuracy are possible.
arXiv Detail & Related papers (2020-08-31T13:28:03Z) - A Flexible Framework for Designing Trainable Priors with Adaptive
Smoothing and Game Encoding [57.1077544780653]
We introduce a general framework for designing and training neural network layers whose forward passes can be interpreted as solving non-smooth convex optimization problems.
We focus on convex games, solved by local agents represented by the nodes of a graph and interacting through regularization functions.
This approach is appealing for solving imaging problems, as it allows the use of classical image priors within deep models that are trainable end to end.
arXiv Detail & Related papers (2020-06-26T08:34:54Z) - The Hidden Convex Optimization Landscape of Two-Layer ReLU Neural
Networks: an Exact Characterization of the Optimal Solutions [51.60996023961886]
We prove that finding all globally optimal two-layer ReLU neural networks can be performed by solving a convex optimization program with cone constraints.
Our analysis is novel, characterizes all optimal solutions, and does not leverage duality-based analysis which was recently used to lift neural network training into convex spaces.
arXiv Detail & Related papers (2020-06-10T15:38:30Z) - Layer-wise Conditioning Analysis in Exploring the Learning Dynamics of
DNNs [115.35745188028169]
We extend conditioning analysis to deep neural networks (DNNs) in order to investigate their learning dynamics.
We show that batch normalization (BN) can stabilize the training, but sometimes result in the false impression of a local minimum.
We experimentally observe that BN can improve the layer-wise conditioning of the optimization problem.
arXiv Detail & Related papers (2020-02-25T11:40:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.