Related papers: Loss landscapes and optimization in over-parameterized non-linear systems and neural networks

Loss landscapes and optimization in over-parameterized non-linear systems and neural networks

URL: http://arxiv.org/abs/2003.00307v2
Date: Wed, 26 May 2021 19:22:33 GMT
Title: Loss landscapes and optimization in over-parameterized non-linear systems and neural networks
Authors: Chaoyue Liu, Libin Zhu, Mikhail Belkin
Abstract summary: We show that wide neural networks satisfy the PL$*$ condition, which explains the (S)GD convergence to a global minimum. We show that wide neural networks satisfy the PL$*$ condition, which explains the (S)GD convergence to a global minimum.
Score: 20.44438519046223
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The success of deep learning is due, to a large extent, to the remarkable effectiveness of gradient-based optimization methods applied to large neural networks. The purpose of this work is to propose a modern view and a general mathematical framework for loss landscapes and efficient optimization in over-parameterized machine learning models and systems of non-linear equations, a setting that includes over-parameterized deep neural networks. Our starting observation is that optimization problems corresponding to such systems are generally not convex, even locally. We argue that instead they satisfy PL$^*$, a variant of the Polyak-Lojasiewicz condition on most (but not all) of the parameter space, which guarantees both the existence of solutions and efficient optimization by (stochastic) gradient descent (SGD/GD). The PL$^*$ condition of these systems is closely related to the condition number of the tangent kernel associated to a non-linear system showing how a PL$^*$-based non-linear theory parallels classical analyses of over-parameterized linear equations. We show that wide neural networks satisfy the PL$^*$ condition, which explains the (S)GD convergence to a global minimum. Finally we propose a relaxation of the PL$^*$ condition applicable to "almost" over-parameterized systems.

Related papers

Nonconvex Linear System Identification with Minimal State Representation [34.203983563629144]
Low-order linear System IDent (SysID) addresses the challenge of estimating the parameters of a linear dynamical system from finite samples of inputs with minimal state observations.
arXiv Detail & Related papers (2025-04-26T04:11:02Z)
A Guaranteed-Stable Neural Network Approach for Optimal Control of Nonlinear Systems [3.5000297213981653]
A promising approach to optimal control of nonlinear systems involves iteratively linearizing the system and solving an optimization problem at each time instant to determine the optimal control input. Since this approach relies on online optimization, it can be computationally expensive, and thus unrealistic for systems with limited computing resources. One potential solution to this issue is to incorporate a Neural Network (NN) into the control loop.
arXiv Detail & Related papers (2025-01-28T22:55:47Z)
Learning to optimize with convergence guarantees using nonlinear system theory [0.4143603294943439]
We propose an unconstrained parametrization of algorithms for smooth objective functions. Notably, our framework is directly compatible with automatic differentiation tools.
arXiv Detail & Related papers (2024-03-14T13:40:26Z)
The Convex Landscape of Neural Networks: Characterizing Global Optima and Stationary Points via Lasso Models [75.33431791218302]
Deep Neural Network Network (DNN) models are used for programming purposes. In this paper we examine the use of convex neural recovery models. We show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program. We also show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
arXiv Detail & Related papers (2023-12-19T23:04:56Z)
Constrained Optimization via Exact Augmented Lagrangian and Randomized Iterative Sketching [55.28394191394675]
We develop an adaptive inexact Newton method for equality-constrained nonlinear, nonIBS optimization problems. We demonstrate the superior performance of our method on benchmark nonlinear problems, constrained logistic regression with data from LVM, and a PDE-constrained problem.
arXiv Detail & Related papers (2023-05-28T06:33:37Z)
Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure. We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z)
Improved Initialization of State-Space Artificial Neural Networks [0.0]
The identification of black-box nonlinear state-space models requires a flexible representation of the state and output equation. This paper introduces an improved approach for nonlinear state-space models represented as a recurrent artificial neural network.
arXiv Detail & Related papers (2021-03-26T15:16:08Z)
NTopo: Mesh-free Topology Optimization using Implicit Neural Representations [35.07884509198916]
We present a novel machine learning approach to tackle topology optimization problems. We use multilayer perceptrons (MLPs) to parameterize both density and displacement fields. As we show through our experiments, a major benefit of our approach is that it enables self-supervised learning of continuous solution spaces.
arXiv Detail & Related papers (2021-02-22T05:25:22Z)
Pushing the Envelope of Rotation Averaging for Visual SLAM [69.7375052440794]
We propose a novel optimization backbone for visual SLAM systems. We leverage averaging to improve the accuracy, efficiency and robustness of conventional monocular SLAM systems. Our approach can exhibit up to 10x faster with comparable accuracy against the state-art on public benchmarks.
arXiv Detail & Related papers (2020-11-02T18:02:26Z)
A Dynamical View on Optimization Algorithms of Overparameterized Neural Networks [23.038631072178735]
We consider a broad class of optimization algorithms that are commonly used in practice. As a consequence, we can leverage the convergence behavior of neural networks. We believe our approach can also be extended to other optimization algorithms and network theory.
arXiv Detail & Related papers (2020-10-25T17:10:22Z)
Provably Efficient Neural Estimation of Structural Equation Model: An Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs) We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent. For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z)
The role of optimization geometry in single neuron learning [12.891722496444036]
Recent experiments have demonstrated the choice of optimization geometry can impact generalization performance when learning expressive neural model networks. We show how the interplay between geometry and the feature geometry sets the out-of-sample leads and improves performance.
arXiv Detail & Related papers (2020-06-15T17:39:44Z)
Proximal Gradient Algorithm with Momentum and Flexible Parameter Restart for Nonconvex Optimization [73.38702974136102]
Various types of parameter restart schemes have been proposed for accelerated algorithms to facilitate their practical convergence in rates. In this paper, we propose an algorithm for solving nonsmooth problems.
arXiv Detail & Related papers (2020-02-26T16:06:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.