Optimal Stochastic Non-smooth Non-convex Optimization through
Online-to-Non-convex Conversion
- URL: http://arxiv.org/abs/2302.03775v1
- Date: Tue, 7 Feb 2023 22:09:20 GMT
- Title: Optimal Stochastic Non-smooth Non-convex Optimization through
Online-to-Non-convex Conversion
- Authors: Ashok Cutkosky, Harsh Mehta, Francesco Orabona
- Abstract summary: We present new algorithms for optimizing non-known, non-smooth objectives based on a novel analysis technique.
For deterministic second-order smooth objectives, applying advanced optimistic online learning techniques enables a new $O(delta0.5) all$ to recover optimal or best-known results.
- Score: 56.92236659731376
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present new algorithms for optimizing non-smooth, non-convex stochastic
objectives based on a novel analysis technique. This improves the current
best-known complexity for finding a $(\delta,\epsilon)$-stationary point from
$O(\epsilon^{-4}\delta^{-1})$ stochastic gradient queries to
$O(\epsilon^{-3}\delta^{-1})$, which we also show to be optimal. Our primary
technique is a reduction from non-smooth non-convex optimization to online
learning, after which our results follow from standard regret bounds in online
learning. For deterministic and second-order smooth objectives, applying more
advanced optimistic online learning techniques enables a new complexity of
$O(\epsilon^{-1.5}\delta^{-0.5})$. Our techniques also recover all optimal or
best-known results for finding $\epsilon$ stationary points of smooth or
second-order smooth objectives in both stochastic and deterministic settings.
Related papers
- Online Optimization Perspective on First-Order and Zero-Order Decentralized Nonsmooth Nonconvex Stochastic Optimization [8.670873561640903]
We investigate the finite-time analysis of finding ($delta,,ilon$)-stationary points for nonsmooth nonsmooth objectives in decentralized settings.
$O is the first finite-time guarantee for decentralized nonsmooth optimization.
arXiv Detail & Related papers (2024-06-03T16:09:34Z) - An Algorithm with Optimal Dimension-Dependence for Zero-Order Nonsmooth Nonconvex Stochastic Optimization [37.300102993926046]
We study the complexity of producing neither smooth nor convex points of Lipschitz objectives which are possibly using only zero-order evaluations.
Our analysis is based on a simple yet powerful.
Goldstein-subdifferential set, which allows recent advancements in.
nonsmooth non optimization.
arXiv Detail & Related papers (2023-07-10T11:56:04Z) - Variance-reduced Clipping for Non-convex Optimization [24.765794811146144]
Gradient clipping is a technique used in deep learning applications such as large-scale language modeling.
Recent experimental training have a fairly special behavior in that it mitigates order complexity.
arXiv Detail & Related papers (2023-03-02T00:57:38Z) - Deterministic Nonsmooth Nonconvex Optimization [94.01526844386977]
We show that randomization is necessary to obtain a dimension-free dimension-free algorithm.
Our algorithm yields the first deterministic dimension-free algorithm for optimizing ReLU networks.
arXiv Detail & Related papers (2023-02-16T13:57:19Z) - A Fully First-Order Method for Stochastic Bilevel Optimization [8.663726907303303]
We consider unconstrained bilevel optimization problems when only the first-order gradient oracles are available.
We propose a Fully First-order Approximation (F2SA) method, and study its non-asymptotic convergence properties.
We demonstrate even superior practical performance of the proposed method over existing second-order based approaches on MNIST data-hypercleaning experiments.
arXiv Detail & Related papers (2023-01-26T05:34:21Z) - BiAdam: Fast Adaptive Bilevel Optimization Methods [104.96004056928474]
Bilevel optimization has attracted increased interest in machine learning due to its many applications.
We provide a useful analysis framework for both the constrained and unconstrained optimization.
arXiv Detail & Related papers (2021-06-21T20:16:40Z) - Oracle Complexity in Nonsmooth Nonconvex Optimization [49.088972349825085]
It is well-known that given a smooth, bounded-from-below $$stationary points, Oracle-based methods can find smooth approximation of smoothness.
In this paper, we prove an inherent trade-off between optimization and smoothing dimension.
arXiv Detail & Related papers (2021-04-14T10:42:45Z) - Gradient Free Minimax Optimization: Variance Reduction and Faster
Convergence [120.9336529957224]
In this paper, we denote the non-strongly setting on the magnitude of a gradient-free minimax optimization problem.
We show that a novel zeroth-order variance reduced descent algorithm achieves the best known query complexity.
arXiv Detail & Related papers (2020-06-16T17:55:46Z) - Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions [84.49087114959872]
We provide the first non-asymptotic analysis for finding stationary points of nonsmooth, nonsmooth functions.
In particular, we study Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions.
arXiv Detail & Related papers (2020-02-10T23:23:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.