Zeroth-Order Regularized Optimization (ZORO): Approximately Sparse
Gradients and Adaptive Sampling
- URL: http://arxiv.org/abs/2003.13001v6
- Date: Wed, 1 Dec 2021 03:22:09 GMT
- Title: Zeroth-Order Regularized Optimization (ZORO): Approximately Sparse
Gradients and Adaptive Sampling
- Authors: HanQin Cai, Daniel Mckenzie, Wotao Yin, Zhenliang Zhang
- Abstract summary: We propose a new $textbfZ$eroth-$textbfO$rder $textbfR$egularized $textbfO$ptimization method, dubbed ZORO.
When the underlying gradient is approximately sparse at an iterate, ZORO needs very few objective function evaluations to obtain a new iterate that decreases the objective function.
- Score: 29.600975900977343
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of minimizing a high-dimensional objective function,
which may include a regularization term, using (possibly noisy) evaluations of
the function. Such optimization is also called derivative-free, zeroth-order,
or black-box optimization. We propose a new $\textbf{Z}$eroth-$\textbf{O}$rder
$\textbf{R}$egularized $\textbf{O}$ptimization method, dubbed ZORO. When the
underlying gradient is approximately sparse at an iterate, ZORO needs very few
objective function evaluations to obtain a new iterate that decreases the
objective function. We achieve this with an adaptive, randomized gradient
estimator, followed by an inexact proximal-gradient scheme. Under a novel
approximately sparse gradient assumption and various different convex settings,
we show the (theoretical and empirical) convergence rate of ZORO is only
logarithmically dependent on the problem dimension. Numerical experiments show
that ZORO outperforms the existing methods with similar assumptions, on both
synthetic and real datasets.
Related papers
- Obtaining Lower Query Complexities through Lightweight Zeroth-Order Proximal Gradient Algorithms [65.42376001308064]
We propose two variance reduced ZO estimators for complex gradient problems.
We improve the state-of-the-art function complexities from $mathcalOleft(minfracdn1/2epsilon2, fracdepsilon3right)$ to $tildecalOleft(fracdepsilon2right)$.
arXiv Detail & Related papers (2024-10-03T15:04:01Z) - Double Variance Reduction: A Smoothing Trick for Composite Optimization Problems without First-Order Gradient [40.22217106270146]
Variance reduction techniques are designed to decrease the sampling variance, thereby accelerating convergence rates of first-order (FO) and zeroth-order (ZO) optimization methods.
In composite optimization problems, ZO methods encounter an additional variance called the coordinate-wise variance, which stems from the random estimation.
This paper proposes the Zeroth-order Proximal Double Variance Reduction (ZPDVR) method, which utilizes the averaging trick to reduce both sampling and coordinate-wise variances.
arXiv Detail & Related papers (2024-05-28T02:27:53Z) - A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - Stochastic Bias-Reduced Gradient Methods [44.35885731095432]
We develop a new primitive for optimization: a low-bias, low-cost smoothing of ther $x_star$ of any bound of the Moreau-Yoshida function.
arXiv Detail & Related papers (2021-06-17T13:33:05Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Sparse Representations of Positive Functions via First and Second-Order
Pseudo-Mirror Descent [15.340540198612823]
We consider expected risk problems when the range of the estimator is required to be nonnegative.
We develop first and second-order variants of approximation mirror descent employing emphpseudo-gradients.
Experiments demonstrate favorable performance on ingeneous Process intensity estimation in practice.
arXiv Detail & Related papers (2020-11-13T21:54:28Z) - Gradient Free Minimax Optimization: Variance Reduction and Faster
Convergence [120.9336529957224]
In this paper, we denote the non-strongly setting on the magnitude of a gradient-free minimax optimization problem.
We show that a novel zeroth-order variance reduced descent algorithm achieves the best known query complexity.
arXiv Detail & Related papers (2020-06-16T17:55:46Z) - Exploiting Higher Order Smoothness in Derivative-free Optimization and
Continuous Bandits [99.70167985955352]
We study the problem of zero-order optimization of a strongly convex function.
We consider a randomized approximation of the projected gradient descent algorithm.
Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters.
arXiv Detail & Related papers (2020-06-14T10:42:23Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.