Parameter-free Stochastic Optimization of Variationally Coherent
Functions
- URL: http://arxiv.org/abs/2102.00236v1
- Date: Sat, 30 Jan 2021 15:05:34 GMT
- Title: Parameter-free Stochastic Optimization of Variationally Coherent
Functions
- Authors: Francesco Orabona and D\'avid P\'al
- Abstract summary: We design and analyze an algorithm for first-order optimization of a class of functions on $mathbbRdilon.
It is the first to achieve both these at the same time.
- Score: 19.468067110814808
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We design and analyze an algorithm for first-order stochastic optimization of
a large class of functions on $\mathbb{R}^d$. In particular, we consider the
\emph{variationally coherent} functions which can be convex or non-convex. The
iterates of our algorithm on variationally coherent functions converge almost
surely to the global minimizer $\boldsymbol{x}^*$. Additionally, the very same
algorithm with the same hyperparameters, after $T$ iterations guarantees on
convex functions that the expected suboptimality gap is bounded by
$\widetilde{O}(\|\boldsymbol{x}^* - \boldsymbol{x}_0\| T^{-1/2+\epsilon})$ for
any $\epsilon>0$. It is the first algorithm to achieve both these properties at
the same time. Also, the rate for convex functions essentially matches the
performance of parameter-free algorithms. Our algorithm is an instance of the
Follow The Regularized Leader algorithm with the added twist of using
\emph{rescaled gradients} and time-varying linearithmic regularizers.
Related papers
- An Algorithm with Optimal Dimension-Dependence for Zero-Order Nonsmooth Nonconvex Stochastic Optimization [37.300102993926046]
We study the complexity of producing neither smooth nor convex points of Lipschitz objectives which are possibly using only zero-order evaluations.
Our analysis is based on a simple yet powerful.
Goldstein-subdifferential set, which allows recent advancements in.
nonsmooth non optimization.
arXiv Detail & Related papers (2023-07-10T11:56:04Z) - An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue
Optimization Problems [76.2042837251496]
We introduce two oblivious mirror descent algorithms based on a complementary composite setting.
Remarkably, both algorithms work without prior knowledge of the Lipschitz constant or smoothness of the objective function.
We show how to extend our framework to scale and demonstrate the efficiency and robustness of our methods on large scale semidefinite programs.
arXiv Detail & Related papers (2023-06-30T08:34:29Z) - Gradient-free optimization of highly smooth functions: improved analysis
and a new algorithm [87.22224691317766]
This work studies problems with zero-order noisy oracle information under the assumption that the objective function is highly smooth.
We consider two kinds of zero-order projected gradient descent algorithms.
arXiv Detail & Related papers (2023-06-03T17:05:13Z) - A One-Sample Decentralized Proximal Algorithm for Non-Convex Stochastic
Composite Optimization [10.762749887051546]
We propose two-time scale algorithms: ProxDAS-A and Proxcal$DASA-GT.
Unlike prior work, our algorithms achieve comparable complexity without requiring large batch sizes, more complex per-it operations, or stronger assumptions.
arXiv Detail & Related papers (2023-02-20T05:16:18Z) - Deterministic Nonsmooth Nonconvex Optimization [94.01526844386977]
We show that randomization is necessary to obtain a dimension-free dimension-free algorithm.
Our algorithm yields the first deterministic dimension-free algorithm for optimizing ReLU networks.
arXiv Detail & Related papers (2023-02-16T13:57:19Z) - Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization [116.89941263390769]
We consider the smooth convex-concave bilinearly-coupled saddle-point problem, $min_mathbfxmax_mathbfyF(mathbfx) + H(mathbfx,mathbfy)$, where one has access to first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$.
We present a emphaccelerated gradient-extragradient (AG-EG) descent-ascent algorithm that combines extragrad
arXiv Detail & Related papers (2022-06-17T06:10:20Z) - Fast Stochastic Composite Minimization and an Accelerated Frank-Wolfe Algorithm under Parallelization [7.197233473373693]
We consider the problem of minimizing the sum of two convex functions.
One has Lipschitz-continuous gradients, and can be accessed via oracles, whereas the other is "simple"
We show that one can achieve an $epsilon$ primaldual gap (in expectation) in $tildeO (1/ sqrtepsilon)$ iterations.
arXiv Detail & Related papers (2022-05-25T13:01:09Z) - Convergence rates of the stochastic alternating algorithm for
bi-objective optimization [0.0]
We show that alternating algorithms achieve a sublinear convergence rate of $mathcalO (1/sqrtT)$ under strong convexity.
Importantly, by varying the proportion of steps applied to each function, one can determine an approximation to the front.
arXiv Detail & Related papers (2022-03-20T17:31:08Z) - A Projection-free Algorithm for Constrained Stochastic Multi-level
Composition Optimization [12.096252285460814]
We propose a projection-free conditional gradient-type algorithm for composition optimization.
We show that the number of oracles and the linear-minimization oracle required by the proposed algorithm, are of order $mathcalO_T(epsilon-2)$ and $mathcalO_T(epsilon-3)$ respectively.
arXiv Detail & Related papers (2022-02-09T06:05:38Z) - A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis
and Application to Actor-Critic [142.1492359556374]
Bilevel optimization is a class of problems which exhibit a two-level structure.
We propose a two-timescale approximation (TTSA) algorithm for tackling such a bilevel problem.
We show that a two-timescale natural actor-critic policy optimization algorithm can be viewed as a special case of our TTSA framework.
arXiv Detail & Related papers (2020-07-10T05:20:02Z) - Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions [84.49087114959872]
We provide the first non-asymptotic analysis for finding stationary points of nonsmooth, nonsmooth functions.
In particular, we study Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions.
arXiv Detail & Related papers (2020-02-10T23:23:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.