Gradient-free optimization of highly smooth functions: improved analysis
and a new algorithm
- URL: http://arxiv.org/abs/2306.02159v1
- Date: Sat, 3 Jun 2023 17:05:13 GMT
- Title: Gradient-free optimization of highly smooth functions: improved analysis
and a new algorithm
- Authors: Arya Akhavan, Evgenii Chzhen, Massimiliano Pontil, and Alexandre B.
Tsybakov
- Abstract summary: This work studies problems with zero-order noisy oracle information under the assumption that the objective function is highly smooth.
We consider two kinds of zero-order projected gradient descent algorithms.
- Score: 87.22224691317766
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: This work studies minimization problems with zero-order noisy oracle
information under the assumption that the objective function is highly smooth
and possibly satisfies additional properties. We consider two kinds of
zero-order projected gradient descent algorithms, which differ in the form of
the gradient estimator. The first algorithm uses a gradient estimator based on
randomization over the $\ell_2$ sphere due to Bach and Perchet (2016). We
present an improved analysis of this algorithm on the class of highly smooth
and strongly convex functions studied in the prior work, and we derive rates of
convergence for two more general classes of non-convex functions. Namely, we
consider highly smooth functions satisfying the Polyak-{\L}ojasiewicz condition
and the class of highly smooth functions with no additional property. The
second algorithm is based on randomization over the $\ell_1$ sphere, and it
extends to the highly smooth setting the algorithm that was recently proposed
for Lipschitz convex functions in Akhavan et al. (2022). We show that, in the
case of noiseless oracle, this novel algorithm enjoys better bounds on bias and
variance than the $\ell_2$ randomization and the commonly used Gaussian
randomization algorithms, while in the noisy case both $\ell_1$ and $\ell_2$
algorithms benefit from similar improved theoretical guarantees. The
improvements are achieved thanks to a new proof techniques based on Poincar\'e
type inequalities for uniform distributions on the $\ell_1$ or $\ell_2$
spheres. The results are established under weak (almost adversarial)
assumptions on the noise. Moreover, we provide minimax lower bounds proving
optimality or near optimality of the obtained upper bounds in several cases.
Related papers
- Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity [59.75300530380427]
We consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries.
We provide the first tight characterization for the rate of the minimax simple regret by developing matching upper and lower bounds.
arXiv Detail & Related papers (2024-06-28T02:56:22Z) - Fully Zeroth-Order Bilevel Programming via Gaussian Smoothing [7.143879014059895]
We study and analyze zeroth-order approximation algorithms for solving bilvel problems.
To the best of our knowledge, this is the first time that sample bounds are established for a fully zeroth-order bilevel optimization algorithm.
arXiv Detail & Related papers (2024-03-29T21:12:25Z) - An Algorithm with Optimal Dimension-Dependence for Zero-Order Nonsmooth Nonconvex Stochastic Optimization [37.300102993926046]
We study the complexity of producing neither smooth nor convex points of Lipschitz objectives which are possibly using only zero-order evaluations.
Our analysis is based on a simple yet powerful.
Goldstein-subdifferential set, which allows recent advancements in.
nonsmooth non optimization.
arXiv Detail & Related papers (2023-07-10T11:56:04Z) - Stochastic Nonsmooth Convex Optimization with Heavy-Tailed Noises:
High-Probability Bound, In-Expectation Rate and Initial Distance Adaptation [22.758674468435302]
In a heavy-tailed noise regime, the difference between the gradient and the true rate is assumed to have a finite $p-th moment.
This paper provides a comprehensive analysis of nonsmooth convex optimization with heavy-tailed noises.
arXiv Detail & Related papers (2023-03-22T03:05:28Z) - A gradient estimator via L1-randomization for online zero-order
optimization with two point feedback [93.57603470949266]
We present a novel gradient estimator based on two function evaluation and randomization.
We consider two types of assumptions on the noise of the zero-order oracle: canceling noise and adversarial noise.
We provide an anytime and completely data-driven algorithm, which is adaptive to all parameters of the problem.
arXiv Detail & Related papers (2022-05-27T11:23:57Z) - A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis
and Application to Actor-Critic [142.1492359556374]
Bilevel optimization is a class of problems which exhibit a two-level structure.
We propose a two-timescale approximation (TTSA) algorithm for tackling such a bilevel problem.
We show that a two-timescale natural actor-critic policy optimization algorithm can be viewed as a special case of our TTSA framework.
arXiv Detail & Related papers (2020-07-10T05:20:02Z) - Exploiting Higher Order Smoothness in Derivative-free Optimization and
Continuous Bandits [99.70167985955352]
We study the problem of zero-order optimization of a strongly convex function.
We consider a randomized approximation of the projected gradient descent algorithm.
Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters.
arXiv Detail & Related papers (2020-06-14T10:42:23Z) - Private Stochastic Convex Optimization: Optimal Rates in Linear Time [74.47681868973598]
We study the problem of minimizing the population loss given i.i.d. samples from a distribution over convex loss functions.
A recent work of Bassily et al. has established the optimal bound on the excess population loss achievable given $n$ samples.
We describe two new techniques for deriving convex optimization algorithms both achieving the optimal bound on excess loss and using $O(minn, n2/d)$ gradient computations.
arXiv Detail & Related papers (2020-05-10T19:52:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.