Zeroth-Order Regularized Optimization (ZORO): Approximately Sparse
  Gradients and Adaptive Sampling
        - URL: http://arxiv.org/abs/2003.13001v6
 - Date: Wed, 1 Dec 2021 03:22:09 GMT
 - Title: Zeroth-Order Regularized Optimization (ZORO): Approximately Sparse
  Gradients and Adaptive Sampling
 - Authors: HanQin Cai, Daniel Mckenzie, Wotao Yin, Zhenliang Zhang
 - Abstract summary: We propose a new $textbfZ$eroth-$textbfO$rder $textbfR$egularized $textbfO$ptimization method, dubbed ZORO.
When the underlying gradient is approximately sparse at an iterate, ZORO needs very few objective function evaluations to obtain a new iterate that decreases the objective function.
 - Score: 29.600975900977343
 - License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
 - Abstract:   We consider the problem of minimizing a high-dimensional objective function,
which may include a regularization term, using (possibly noisy) evaluations of
the function. Such optimization is also called derivative-free, zeroth-order,
or black-box optimization. We propose a new $\textbf{Z}$eroth-$\textbf{O}$rder
$\textbf{R}$egularized $\textbf{O}$ptimization method, dubbed ZORO. When the
underlying gradient is approximately sparse at an iterate, ZORO needs very few
objective function evaluations to obtain a new iterate that decreases the
objective function. We achieve this with an adaptive, randomized gradient
estimator, followed by an inexact proximal-gradient scheme. Under a novel
approximately sparse gradient assumption and various different convex settings,
we show the (theoretical and empirical) convergence rate of ZORO is only
logarithmically dependent on the problem dimension. Numerical experiments show
that ZORO outperforms the existing methods with similar assumptions, on both
synthetic and real datasets.
 
       
      
        Related papers
        - Gradient-free stochastic optimization for additive models [56.42455605591779]
We address the problem of zero-order optimization from noisy observations for an objective function satisfying the Polyak-Lojasiewicz or the strong convexity condition.
We assume that the objective function has an additive structure and satisfies a higher-order smoothness property, characterized by the H"older family of functions.
arXiv  Detail & Related papers  (2025-03-03T23:39:08Z) - Obtaining Lower Query Complexities through Lightweight Zeroth-Order   Proximal Gradient Algorithms [65.42376001308064]
We propose two variance reduced ZO estimators for complex gradient problems.
We improve the state-of-the-art function complexities from $mathcalOleft(minfracdn1/2epsilon2, fracdepsilon3right)$ to $tildecalOleft(fracdepsilon2right)$.
arXiv  Detail & Related papers  (2024-10-03T15:04:01Z) - Double Variance Reduction: A Smoothing Trick for Composite Optimization   Problems without First-Order Gradient [40.22217106270146]
Variance reduction techniques are designed to decrease the sampling variance, thereby accelerating convergence rates of first-order (FO) and zeroth-order (ZO) optimization methods.
In composite optimization problems, ZO methods encounter an additional variance called the coordinate-wise variance, which stems from the random estimation.
This paper proposes the Zeroth-order Proximal Double Variance Reduction (ZPDVR) method, which utilizes the averaging trick to reduce both sampling and coordinate-wise variances.
arXiv  Detail & Related papers  (2024-05-28T02:27:53Z) - A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for   Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv  Detail & Related papers  (2024-04-18T16:46:08Z) - Stochastic Bias-Reduced Gradient Methods [44.35885731095432]
We develop a new primitive for optimization: a low-bias, low-cost smoothing of ther $x_star$ of any bound of the Moreau-Yoshida function.
arXiv  Detail & Related papers  (2021-06-17T13:33:05Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
  Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv  Detail & Related papers  (2020-12-21T17:29:58Z) - Sparse Representations of Positive Functions via First and Second-Order
  Pseudo-Mirror Descent [15.340540198612823]
We consider expected risk problems when the range of the estimator is required to be nonnegative.
We develop first and second-order variants of approximation mirror descent employing emphpseudo-gradients.
 Experiments demonstrate favorable performance on ingeneous Process intensity estimation in practice.
arXiv  Detail & Related papers  (2020-11-13T21:54:28Z) - Gradient Free Minimax Optimization: Variance Reduction and Faster
  Convergence [120.9336529957224]
In this paper, we denote the non-strongly setting on the magnitude of a gradient-free minimax optimization problem.
We show that a novel zeroth-order variance reduced descent algorithm achieves the best known query complexity.
arXiv  Detail & Related papers  (2020-06-16T17:55:46Z) - Exploiting Higher Order Smoothness in Derivative-free Optimization and
  Continuous Bandits [99.70167985955352]
We study the problem of zero-order optimization of a strongly convex function.
We consider a randomized approximation of the projected gradient descent algorithm.
Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters.
arXiv  Detail & Related papers  (2020-06-14T10:42:23Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
  Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv  Detail & Related papers  (2019-12-26T22:10:10Z) 
        This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.