Deterministic Nonsmooth Nonconvex Optimization
- URL: http://arxiv.org/abs/2302.08300v1
- Date: Thu, 16 Feb 2023 13:57:19 GMT
- Title: Deterministic Nonsmooth Nonconvex Optimization
- Authors: Michael I. Jordan, Guy Kornowski, Tianyi Lin, Ohad Shamir, Manolis
Zampetakis
- Abstract summary: We show that randomization is necessary to obtain a dimension-free dimension-free algorithm.
Our algorithm yields the first deterministic dimension-free algorithm for optimizing ReLU networks.
- Score: 94.01526844386977
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the complexity of optimizing nonsmooth nonconvex Lipschitz functions
by producing $(\delta,\epsilon)$-stationary points. Several recent works have
presented randomized algorithms that produce such points using $\tilde
O(\delta^{-1}\epsilon^{-3})$ first-order oracle calls, independent of the
dimension $d$. It has been an open problem as to whether a similar result can
be obtained via a deterministic algorithm. We resolve this open problem,
showing that randomization is necessary to obtain a dimension-free rate. In
particular, we prove a lower bound of $\Omega(d)$ for any deterministic
algorithm. Moreover, we show that unlike smooth or convex optimization, access
to function values is required for any deterministic algorithm to halt within
any finite time.
On the other hand, we prove that if the function is even slightly smooth,
then the dimension-free rate of $\tilde O(\delta^{-1}\epsilon^{-3})$ can be
obtained by a deterministic algorithm with merely a logarithmic dependence on
the smoothness parameter. Motivated by these findings, we turn to study the
complexity of deterministically smoothing Lipschitz functions. Though there are
efficient black-box randomized smoothings, we start by showing that no such
deterministic procedure can smooth functions in a meaningful manner, resolving
an open question. We then bypass this impossibility result for the structured
case of ReLU neural networks. To that end, in a practical white-box setting in
which the optimizer is granted access to the network's architecture, we propose
a simple, dimension-free, deterministic smoothing that provably preserves
$(\delta,\epsilon)$-stationary points. Our method applies to a variety of
architectures of arbitrary depth, including ResNets and ConvNets. Combined with
our algorithm, this yields the first deterministic dimension-free algorithm for
optimizing ReLU networks, circumventing our lower bound.
Related papers
- Quantum Algorithms for Non-smooth Non-convex Optimization [30.576546266390714]
This paper considers the problem for finding the $(,epsilon)$-Goldstein stationary point of Lipschitz continuous objective.
We construct a zeroth-order quantum estimator for the surrogate oracle function.
arXiv Detail & Related papers (2024-10-21T16:52:26Z) - An Algorithm with Optimal Dimension-Dependence for Zero-Order Nonsmooth Nonconvex Stochastic Optimization [37.300102993926046]
We study the complexity of producing neither smooth nor convex points of Lipschitz objectives which are possibly using only zero-order evaluations.
Our analysis is based on a simple yet powerful.
Goldstein-subdifferential set, which allows recent advancements in.
nonsmooth non optimization.
arXiv Detail & Related papers (2023-07-10T11:56:04Z) - An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue
Optimization Problems [76.2042837251496]
We introduce two oblivious mirror descent algorithms based on a complementary composite setting.
Remarkably, both algorithms work without prior knowledge of the Lipschitz constant or smoothness of the objective function.
We show how to extend our framework to scale and demonstrate the efficiency and robustness of our methods on large scale semidefinite programs.
arXiv Detail & Related papers (2023-06-30T08:34:29Z) - Mind the gap: Achieving a super-Grover quantum speedup by jumping to the
end [114.3957763744719]
We present a quantum algorithm that has rigorous runtime guarantees for several families of binary optimization problems.
We show that the algorithm finds the optimal solution in time $O*(2(0.5-c)n)$ for an $n$-independent constant $c$.
We also show that for a large fraction of random instances from the $k$-spin model and for any fully satisfiable or slightly frustrated $k$-CSP formula, statement (a) is the case.
arXiv Detail & Related papers (2022-12-03T02:45:23Z) - On the Complexity of Finding Small Subgradients in Nonsmooth
Optimization [31.714928102950584]
We show that no dimension-free rate can be achieved by a deterministic algorithm.
We show how the convergence rate of finding $(delta,epsilon)$-stationary points can be improved in case the function is convex.
arXiv Detail & Related papers (2022-09-21T13:30:00Z) - Near-Optimal Lower Bounds For Convex Optimization For All Orders of
Smoothness [26.71898403195793]
We study the complexity of optimizing highly smooth convex functions.
For a positive integer $p$, we want to find an $epsilon$-approximate minimum of a convex function $f$.
We prove a new lower bound that matches this bound (up to log factors), and holds not only for randomized algorithms, but also for quantum algorithms.
arXiv Detail & Related papers (2021-12-02T10:51:43Z) - Oracle Complexity in Nonsmooth Nonconvex Optimization [49.088972349825085]
It is well-known that given a smooth, bounded-from-below $$stationary points, Oracle-based methods can find smooth approximation of smoothness.
In this paper, we prove an inherent trade-off between optimization and smoothing dimension.
arXiv Detail & Related papers (2021-04-14T10:42:45Z) - No quantum speedup over gradient descent for non-smooth convex
optimization [22.16973542453584]
Black-box access to a (not necessarily smooth) function $f:mathbbRn to mathbbR$ and its (sub)gradient.
Our goal is to find an $epsilon$-approximate minimum of $f$ starting from a point that is distance at most $R$ from the true minimum.
We show that although the function family used in the lower bound is hard for randomized algorithms, it can be solved using $O(GR/epsilon)$ quantum queries.
arXiv Detail & Related papers (2020-10-05T06:32:47Z) - Gradient Free Minimax Optimization: Variance Reduction and Faster
Convergence [120.9336529957224]
In this paper, we denote the non-strongly setting on the magnitude of a gradient-free minimax optimization problem.
We show that a novel zeroth-order variance reduced descent algorithm achieves the best known query complexity.
arXiv Detail & Related papers (2020-06-16T17:55:46Z) - Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions [84.49087114959872]
We provide the first non-asymptotic analysis for finding stationary points of nonsmooth, nonsmooth functions.
In particular, we study Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions.
arXiv Detail & Related papers (2020-02-10T23:23:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.