Loss landscapes and optimization in over-parameterized non-linear
systems and neural networks
- URL: http://arxiv.org/abs/2003.00307v2
- Date: Wed, 26 May 2021 19:22:33 GMT
- Title: Loss landscapes and optimization in over-parameterized non-linear
systems and neural networks
- Authors: Chaoyue Liu, Libin Zhu, Mikhail Belkin
- Abstract summary: We show that wide neural networks satisfy the PL$*$ condition, which explains the (S)GD convergence to a global minimum.
We show that wide neural networks satisfy the PL$*$ condition, which explains the (S)GD convergence to a global minimum.
- Score: 20.44438519046223
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The success of deep learning is due, to a large extent, to the remarkable
effectiveness of gradient-based optimization methods applied to large neural
networks. The purpose of this work is to propose a modern view and a general
mathematical framework for loss landscapes and efficient optimization in
over-parameterized machine learning models and systems of non-linear equations,
a setting that includes over-parameterized deep neural networks. Our starting
observation is that optimization problems corresponding to such systems are
generally not convex, even locally. We argue that instead they satisfy PL$^*$,
a variant of the Polyak-Lojasiewicz condition on most (but not all) of the
parameter space, which guarantees both the existence of solutions and efficient
optimization by (stochastic) gradient descent (SGD/GD). The PL$^*$ condition of
these systems is closely related to the condition number of the tangent kernel
associated to a non-linear system showing how a PL$^*$-based non-linear theory
parallels classical analyses of over-parameterized linear equations. We show
that wide neural networks satisfy the PL$^*$ condition, which explains the
(S)GD convergence to a global minimum. Finally we propose a relaxation of the
PL$^*$ condition applicable to "almost" over-parameterized systems.
Related papers
- Learning to optimize with convergence guarantees using nonlinear system theory [0.4143603294943439]
We propose an unconstrained parametrization of algorithms for smooth objective functions.
Notably, our framework is directly compatible with automatic differentiation tools.
arXiv Detail & Related papers (2024-03-14T13:40:26Z) - The Convex Landscape of Neural Networks: Characterizing Global Optima
and Stationary Points via Lasso Models [75.33431791218302]
Deep Neural Network Network (DNN) models are used for programming purposes.
In this paper we examine the use of convex neural recovery models.
We show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
We also show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
arXiv Detail & Related papers (2023-12-19T23:04:56Z) - Constrained Optimization via Exact Augmented Lagrangian and Randomized
Iterative Sketching [55.28394191394675]
We develop an adaptive inexact Newton method for equality-constrained nonlinear, nonIBS optimization problems.
We demonstrate the superior performance of our method on benchmark nonlinear problems, constrained logistic regression with data from LVM, and a PDE-constrained problem.
arXiv Detail & Related papers (2023-05-28T06:33:37Z) - Fractal Structure and Generalization Properties of Stochastic
Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure.
We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z) - Improved Initialization of State-Space Artificial Neural Networks [0.0]
The identification of black-box nonlinear state-space models requires a flexible representation of the state and output equation.
This paper introduces an improved approach for nonlinear state-space models represented as a recurrent artificial neural network.
arXiv Detail & Related papers (2021-03-26T15:16:08Z) - NTopo: Mesh-free Topology Optimization using Implicit Neural
Representations [35.07884509198916]
We present a novel machine learning approach to tackle topology optimization problems.
We use multilayer perceptrons (MLPs) to parameterize both density and displacement fields.
As we show through our experiments, a major benefit of our approach is that it enables self-supervised learning of continuous solution spaces.
arXiv Detail & Related papers (2021-02-22T05:25:22Z) - Pushing the Envelope of Rotation Averaging for Visual SLAM [69.7375052440794]
We propose a novel optimization backbone for visual SLAM systems.
We leverage averaging to improve the accuracy, efficiency and robustness of conventional monocular SLAM systems.
Our approach can exhibit up to 10x faster with comparable accuracy against the state-art on public benchmarks.
arXiv Detail & Related papers (2020-11-02T18:02:26Z) - A Dynamical View on Optimization Algorithms of Overparameterized Neural
Networks [23.038631072178735]
We consider a broad class of optimization algorithms that are commonly used in practice.
As a consequence, we can leverage the convergence behavior of neural networks.
We believe our approach can also be extended to other optimization algorithms and network theory.
arXiv Detail & Related papers (2020-10-25T17:10:22Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z) - The role of optimization geometry in single neuron learning [12.891722496444036]
Recent experiments have demonstrated the choice of optimization geometry can impact generalization performance when learning expressive neural model networks.
We show how the interplay between geometry and the feature geometry sets the out-of-sample leads and improves performance.
arXiv Detail & Related papers (2020-06-15T17:39:44Z) - Proximal Gradient Algorithm with Momentum and Flexible Parameter Restart
for Nonconvex Optimization [73.38702974136102]
Various types of parameter restart schemes have been proposed for accelerated algorithms to facilitate their practical convergence in rates.
In this paper, we propose an algorithm for solving nonsmooth problems.
arXiv Detail & Related papers (2020-02-26T16:06:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.