The Convex Geometry of Backpropagation: Neural Network Gradient Flows
Converge to Extreme Points of the Dual Convex Program
- URL: http://arxiv.org/abs/2110.06488v1
- Date: Wed, 13 Oct 2021 04:17:08 GMT
- Title: The Convex Geometry of Backpropagation: Neural Network Gradient Flows
Converge to Extreme Points of the Dual Convex Program
- Authors: Yifei Wang, Mert Pilanci
- Abstract summary: We study non- subgradient flows for two-layer ReLULU networks from a convex implicit geometry and duality perspective.
We show that we can identify the problem of non- subgradient descent via primal-dual correspondence.
- Score: 26.143558180103334
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study non-convex subgradient flows for training two-layer ReLU neural
networks from a convex geometry and duality perspective. We characterize the
implicit bias of unregularized non-convex gradient flow as convex
regularization of an equivalent convex model. We then show that the limit
points of non-convex subgradient flows can be identified via primal-dual
correspondence in this convex optimization problem. Moreover, we derive a
sufficient condition on the dual variables which ensures that the stationary
points of the non-convex objective are the KKT points of the convex objective,
thus proving convergence of non-convex gradient flows to the global optimum.
For a class of regular training data distributions such as orthogonal separable
data, we show that this sufficient condition holds. Therefore, non-convex
gradient flows in fact converge to optimal solutions of a convex optimization
problem. We present numerical results verifying the predictions of our theory
for non-convex subgradient descent.
Related papers
- Independently-Normalized SGD for Generalized-Smooth Nonconvex Optimization [19.000530691874516]
We show that many non machine learning problems meet that kind of condition that extends beyond traditional non-smoothepseps.
We propose an independently-normalized gradient descent algorithm, which leverages independent sampling and normalization.
arXiv Detail & Related papers (2024-10-17T21:52:00Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Curvature-Independent Last-Iterate Convergence for Games on Riemannian
Manifolds [77.4346324549323]
We show that a step size agnostic to the curvature of the manifold achieves a curvature-independent and linear last-iterate convergence rate.
To the best of our knowledge, the possibility of curvature-independent rates and/or last-iterate convergence has not been considered before.
arXiv Detail & Related papers (2023-06-29T01:20:44Z) - Convex and Non-convex Optimization Under Generalized Smoothness [69.69521650503431]
An analysis of convex and non- optimization methods often requires the Lipsitzness gradient, which limits the analysis by this trajectorys.
Recent work generalizes the gradient setting via the non-uniform smoothness condition.
arXiv Detail & Related papers (2023-06-02T04:21:59Z) - Projective Proximal Gradient Descent for A Class of Nonconvex Nonsmooth Optimization Problems: Fast Convergence Without Kurdyka-Lojasiewicz (KL) Property [19.988762532185884]
Non and nonsmooth optimization problems are important and challenging for learning.
In this paper, we show a new analysis showing fast convergence of PPGD.
arXiv Detail & Related papers (2023-04-20T17:39:24Z) - Linear Convergence of ISTA and FISTA [8.261388753972234]
We revisit the class of iterative shrinkage-thresholding algorithms (ISTA) for solving the linear inverse problem with sparse representation.
We find that the previous assumption for the smooth part to be convex weakens the least-square model.
We generalize the linear convergence to composite optimization in both the objective value and the squared proximal subgradient norm.
arXiv Detail & Related papers (2022-12-13T02:02:50Z) - On Convergence of Training Loss Without Reaching Stationary Points [62.41370821014218]
We show that Neural Network weight variables do not converge to stationary points where the gradient the loss function vanishes.
We propose a new perspective based on ergodic theory dynamical systems.
arXiv Detail & Related papers (2021-10-12T18:12:23Z) - Convex Geometry and Duality of Over-parameterized Neural Networks [70.15611146583068]
We develop a convex analytic approach to analyze finite width two-layer ReLU networks.
We show that an optimal solution to the regularized training problem can be characterized as extreme points of a convex set.
In higher dimensions, we show that the training problem can be cast as a finite dimensional convex problem with infinitely many constraints.
arXiv Detail & Related papers (2020-02-25T23:05:33Z) - GradientDICE: Rethinking Generalized Offline Estimation of Stationary
Values [75.17074235764757]
We present GradientDICE for estimating the density ratio between the state distribution of the target policy and the sampling distribution.
GenDICE is the state-of-the-art for estimating such density ratios.
arXiv Detail & Related papers (2020-01-29T22:10:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.