Mirror Natural Evolution Strategies
- URL: http://arxiv.org/abs/2308.00469v1
- Date: Tue, 1 Aug 2023 11:45:24 GMT
- Title: Mirror Natural Evolution Strategies
- Authors: Haishan Ye
- Abstract summary: We focus on the theory of zeroth-order optimization which utilizes both the first-order and second-order information approximated by the zeroth-order queries.
We show that the estimated covariance matrix of textttMiNES converges to the inverse of Hessian matrix of the objective function.
- Score: 10.495496415022064
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The zeroth-order optimization has been widely used in machine learning
applications. However, the theoretical study of the zeroth-order optimization
focus on the algorithms which approximate (first-order) gradients using
(zeroth-order) function value difference at a random direction. The theory of
algorithms which approximate the gradient and Hessian information by
zeroth-order queries is much less studied. In this paper, we focus on the
theory of zeroth-order optimization which utilizes both the first-order and
second-order information approximated by the zeroth-order queries. We first
propose a novel reparameterized objective function with parameters $(\mu,
\Sigma)$. This reparameterized objective function achieves its optimum at the
minimizer and the Hessian inverse of the original objective function
respectively, but with small perturbations. Accordingly, we propose a new
algorithm to minimize our proposed reparameterized objective, which we call
\texttt{MiNES} (mirror descent natural evolution strategy). We show that the
estimated covariance matrix of \texttt{MiNES} converges to the inverse of
Hessian matrix of the objective function with a convergence rate
$\widetilde{\mathcal{O}}(1/k)$, where $k$ is the iteration number and
$\widetilde{\mathcal{O}}(\cdot)$ hides the constant and $\log$ terms. We also
provide the explicit convergence rate of \texttt{MiNES} and how the covariance
matrix promotes the convergence rate.
Related papers
- Single Point-Based Distributed Zeroth-Order Optimization with a Non-Convex Stochastic Objective Function [14.986031916712108]
We introduce a zero-order distributed optimization method based on a one-point estimate of the gradient tracking technique.
We prove that this new technique converges with a numerical function at a noisy setting.
arXiv Detail & Related papers (2024-10-08T11:45:45Z) - Obtaining Lower Query Complexities through Lightweight Zeroth-Order Proximal Gradient Algorithms [65.42376001308064]
We propose two variance reduced ZO estimators for complex gradient problems.
We improve the state-of-the-art function complexities from $mathcalOleft(minfracdn1/2epsilon2, fracdepsilon3right)$ to $tildecalOleft(fracdepsilon2right)$.
arXiv Detail & Related papers (2024-10-03T15:04:01Z) - An Algorithm with Optimal Dimension-Dependence for Zero-Order Nonsmooth Nonconvex Stochastic Optimization [37.300102993926046]
We study the complexity of producing neither smooth nor convex points of Lipschitz objectives which are possibly using only zero-order evaluations.
Our analysis is based on a simple yet powerful.
Goldstein-subdifferential set, which allows recent advancements in.
nonsmooth non optimization.
arXiv Detail & Related papers (2023-07-10T11:56:04Z) - An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue
Optimization Problems [76.2042837251496]
We introduce two oblivious mirror descent algorithms based on a complementary composite setting.
Remarkably, both algorithms work without prior knowledge of the Lipschitz constant or smoothness of the objective function.
We show how to extend our framework to scale and demonstrate the efficiency and robustness of our methods on large scale semidefinite programs.
arXiv Detail & Related papers (2023-06-30T08:34:29Z) - Extra-Newton: A First Approach to Noise-Adaptive Accelerated
Second-Order Methods [57.050204432302195]
This work proposes a universal and adaptive second-order method for minimizing second-order smooth, convex functions.
Our algorithm achieves $O(sigma / sqrtT)$ convergence when the oracle feedback is with variance $sigma2$, and improves its convergence to $O( 1 / T3)$ with deterministic oracles.
arXiv Detail & Related papers (2022-11-03T14:12:51Z) - Explicit Second-Order Min-Max Optimization Methods with Optimal Convergence Guarantee [86.05440220344755]
We propose and analyze inexact regularized Newton-type methods for finding a global saddle point of emphcon unconstrained min-max optimization problems.
We show that the proposed methods generate iterates that remain within a bounded set and that the iterations converge to an $epsilon$-saddle point within $O(epsilon-2/3)$ in terms of a restricted function.
arXiv Detail & Related papers (2022-10-23T21:24:37Z) - Majorization-minimization for Sparse Nonnegative Matrix Factorization
with the $\beta$-divergence [2.3787352248749376]
It is well known that the norm of the other factor (the dictionary matrix) needs to be controlled in order to avoid an ill-posed formulation.
Standard practice consists in constraining the columns of the dictionary to have unit norm, which leads to a nontrivial optimization problem.
We derive block-descent majorization-minimization algorithms that result in simple multiplicative updates for either $ell_1$-regularization or the more "aggressive" log-regularization.
arXiv Detail & Related papers (2022-07-13T16:09:29Z) - Gradient Free Minimax Optimization: Variance Reduction and Faster
Convergence [120.9336529957224]
In this paper, we denote the non-strongly setting on the magnitude of a gradient-free minimax optimization problem.
We show that a novel zeroth-order variance reduced descent algorithm achieves the best known query complexity.
arXiv Detail & Related papers (2020-06-16T17:55:46Z) - Exploiting Higher Order Smoothness in Derivative-free Optimization and
Continuous Bandits [99.70167985955352]
We study the problem of zero-order optimization of a strongly convex function.
We consider a randomized approximation of the projected gradient descent algorithm.
Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters.
arXiv Detail & Related papers (2020-06-14T10:42:23Z) - Zeroth-Order Regularized Optimization (ZORO): Approximately Sparse
Gradients and Adaptive Sampling [29.600975900977343]
We propose a new $textbfZ$eroth-$textbfO$rder $textbfR$egularized $textbfO$ptimization method, dubbed ZORO.
When the underlying gradient is approximately sparse at an iterate, ZORO needs very few objective function evaluations to obtain a new iterate that decreases the objective function.
arXiv Detail & Related papers (2020-03-29T11:01:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.