Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions
- URL: http://arxiv.org/abs/2002.04130v3
- Date: Mon, 29 Jun 2020 14:53:13 GMT
- Title: Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions
- Authors: Jingzhao Zhang, Hongzhou Lin, Stefanie Jegelka, Ali Jadbabaie, Suvrit
Sra
- Abstract summary: We provide the first non-asymptotic analysis for finding stationary points of nonsmooth, nonsmooth functions.
In particular, we study Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions.
- Score: 84.49087114959872
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We provide the first non-asymptotic analysis for finding stationary points of
nonsmooth, nonconvex functions. In particular, we study the class of Hadamard
semi-differentiable functions, perhaps the largest class of nonsmooth functions
for which the chain rule of calculus holds. This class contains examples such
as ReLU neural networks and others with non-differentiable activation
functions. We first show that finding an $\epsilon$-stationary point with
first-order methods is impossible in finite time. We then introduce the notion
of $(\delta, \epsilon)$-stationarity, which allows for an
$\epsilon$-approximate gradient to be the convex combination of generalized
gradients evaluated at points within distance $\delta$ to the solution. We
propose a series of randomized first-order methods and analyze their complexity
of finding a $(\delta, \epsilon)$-stationary point. Furthermore, we provide a
lower bound and show that our stochastic algorithm has min-max optimal
dependence on $\delta$. Empirically, our methods perform well for training ReLU
neural networks.
Related papers
- Stochastic First-Order Methods with Non-smooth and Non-Euclidean Proximal Terms for Nonconvex High-Dimensional Stochastic Optimization [2.0657831823662574]
When the non problem is by which the non problem is by whichity, the sample of first-order methods may depend linearly on the problem dimension, is for undesirable problems.
Our algorithms allow for the estimate of complexity using the distance of.
mathO (log d) / EuM4.
We prove that DISFOM can sharpen variance employing $mathO (log d) / EuM4.
arXiv Detail & Related papers (2024-06-27T18:38:42Z) - Single-Loop Stochastic Algorithms for Difference of Max-Structured Weakly Convex Functions [41.43895948769255]
We show a class of non-smooth non-asymptotic fairness problems in the form of $min_x[yin Yphi(x, y) - max_zin Zpsix(x, z)]$.
We propose an envelope approximate gradient SMAG, the first method for solving these problems, provide a state-of-the-art non-asymptotic convergence rate.
arXiv Detail & Related papers (2024-05-28T20:52:46Z) - Deterministic Nonsmooth Nonconvex Optimization [94.01526844386977]
We show that randomization is necessary to obtain a dimension-free dimension-free algorithm.
Our algorithm yields the first deterministic dimension-free algorithm for optimizing ReLU networks.
arXiv Detail & Related papers (2023-02-16T13:57:19Z) - A Newton-CG based barrier-augmented Lagrangian method for general nonconvex conic optimization [53.044526424637866]
In this paper we consider finding an approximate second-order stationary point (SOSP) that minimizes a twice different subject general non conic optimization.
In particular, we propose a Newton-CG based-augmentedconjugate method for finding an approximate SOSP.
arXiv Detail & Related papers (2023-01-10T20:43:29Z) - On the Complexity of Finding Small Subgradients in Nonsmooth
Optimization [31.714928102950584]
We show that no dimension-free rate can be achieved by a deterministic algorithm.
We show how the convergence rate of finding $(delta,epsilon)$-stationary points can be improved in case the function is convex.
arXiv Detail & Related papers (2022-09-21T13:30:00Z) - A first-order primal-dual method with adaptivity to local smoothness [64.62056765216386]
We consider the problem of finding a saddle point for the convex-concave objective $min_x max_y f(x) + langle Ax, yrangle - g*(y)$, where $f$ is a convex function with locally Lipschitz gradient and $g$ is convex and possibly non-smooth.
We propose an adaptive version of the Condat-Vu algorithm, which alternates between primal gradient steps and dual steps.
arXiv Detail & Related papers (2021-10-28T14:19:30Z) - Escaping Saddle Points for Nonsmooth Weakly Convex Functions via
Perturbed Proximal Algorithms [0.0]
Main results are based on a novel characterization of $epsilon$-approximate local minimum for nonsmooth functions.
We show that under standard assumptions, the perturbed proximal point, perturbed proximal gradient and perturbed proximal linear algorithms find $epsilon$-approximate local minimum for nonsmooth weakly convex functions.
arXiv Detail & Related papers (2021-02-04T19:17:13Z) - ROOT-SGD: Sharp Nonasymptotics and Near-Optimal Asymptotics in a Single Algorithm [71.13558000599839]
We study the problem of solving strongly convex and smooth unconstrained optimization problems using first-order algorithms.
We devise a novel, referred to as Recursive One-Over-T SGD, based on an easily implementable, averaging of past gradients.
We prove that it simultaneously achieves state-of-the-art performance in both a finite-sample, nonasymptotic sense and an sense.
arXiv Detail & Related papers (2020-08-28T14:46:56Z) - Can We Find Near-Approximately-Stationary Points of Nonsmooth Nonconvex
Functions? [31.714928102950584]
It is well-known that given a bounded, smooth non function, standard gradient-based methods can find certain points.
Perhaps the most natural relaxation is to find points which are such $ilon$ points.
arXiv Detail & Related papers (2020-02-27T08:26:34Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.