Convergence Rate of the (1+1)-Evolution Strategy with Success-Based
Step-Size Adaptation on Convex Quadratic Functions
- URL: http://arxiv.org/abs/2103.01578v2
- Date: Mon, 12 Apr 2021 14:16:38 GMT
- Title: Convergence Rate of the (1+1)-Evolution Strategy with Success-Based
Step-Size Adaptation on Convex Quadratic Functions
- Authors: Daiki Morinaga, Kazuto Fukuchi, Jun Sakuma, and Youhei Akimoto
- Abstract summary: The (1+1)-evolution strategy (ES) with success-based step-size adaptation is analyzed on a general convex quadratic function.
The convergence rate of the (1+1)-ES is derived explicitly and rigorously on a general convex quadratic function.
- Score: 20.666734673282498
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The (1+1)-evolution strategy (ES) with success-based step-size adaptation is
analyzed on a general convex quadratic function and its monotone
transformation, that is, $f(x) = g((x - x^*)^\mathrm{T} H (x - x^*))$, where
$g:\mathbb{R}\to\mathbb{R}$ is a strictly increasing function, $H$ is a
positive-definite symmetric matrix, and $x^* \in \mathbb{R}^d$ is the optimal
solution of $f$. The convergence rate, that is, the decrease rate of the
distance from a search point $m_t$ to the optimal solution $x^*$, is proven to
be in $O(\exp( - L / \mathrm{Tr}(H) ))$, where $L$ is the smallest eigenvalue
of $H$ and $\mathrm{Tr}(H)$ is the trace of $H$. This result generalizes the
known rate of $O(\exp(- 1/d ))$ for the case of $H = I_{d}$ ($I_d$ is the
identity matrix of dimension $d$) and $O(\exp(- 1/ (d\cdot\xi) ))$ for the case
of $H = \mathrm{diag}(\xi \cdot I_{d/2}, I_{d/2})$. To the best of our
knowledge, this is the first study in which the convergence rate of the
(1+1)-ES is derived explicitly and rigorously on a general convex quadratic
function, which depicts the impact of the distribution of the eigenvalues in
the Hessian $H$ on the optimization and not only the impact of the condition
number of $H$.
Related papers
- On the $O(\frac{\sqrt{d}}{T^{1/4}})$ Convergence Rate of RMSProp and Its Momentum Extension Measured by $\ell_1$ Norm [59.65871549878937]
This paper considers the RMSProp and its momentum extension and establishes the convergence rate of $frac1Tsum_k=1T.
Our convergence rate matches the lower bound with respect to all the coefficients except the dimension $d$.
Our convergence rate can be considered to be analogous to the $frac1Tsum_k=1T.
arXiv Detail & Related papers (2024-02-01T07:21:32Z) - Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix
Factorization [54.29685789885059]
We introduce efficient $(1+varepsilon)$-approximation algorithms for the binary matrix factorization (BMF) problem.
The goal is to approximate $mathbfA$ as a product of low-rank factors.
Our techniques generalize to other common variants of the BMF problem.
arXiv Detail & Related papers (2023-06-02T18:55:27Z) - Accelerated Methods for Riemannian Min-Max Optimization Ensuring Bounded
Geometric Penalties [21.141544548229774]
We study the form $min_x max_y f(x, y) where $mathcalN$ are Hadamard.
We show global interest accelerated by reducing gradient convergence constants.
arXiv Detail & Related papers (2023-05-25T15:43:07Z) - Convergence of a Normal Map-based Prox-SGD Method under the KL
Inequality [0.0]
We present a novel map-based algorithm ($mathsfnorMtext-mathsfSGD$) for $symbol$k$ convergence problems.
arXiv Detail & Related papers (2023-05-10T01:12:11Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Unique Games hardness of Quantum Max-Cut, and a conjectured
vector-valued Borell's inequality [6.621324975749854]
We show that the noise stability of a function $f:mathbbRn to -1, 1$ is the expected value of $f(boldsymbolx) cdot f(boldsymboly)$.
We conjecture that the expected value of $langle f(boldsymbolx), f(boldsymboly)rangle$ is minimized by the function $f(x) = x_leq k / Vert x_leq k /
arXiv Detail & Related papers (2021-11-01T20:45:42Z) - Random matrices in service of ML footprint: ternary random features with
no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$.
We propose a novel random technique, called Ternary Random Feature (TRF)
The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z) - Spectral properties of sample covariance matrices arising from random
matrices with independent non identically distributed columns [50.053491972003656]
It was previously shown that the functionals $texttr(AR(z))$, for $R(z) = (frac1nXXT- zI_p)-1$ and $Ain mathcal M_p$ deterministic, have a standard deviation of order $O(|A|_* / sqrt n)$.
Here, we show that $|mathbb E[R(z)] - tilde R(z)|_F
arXiv Detail & Related papers (2021-09-06T14:21:43Z) - Sparse sketches with small inversion bias [79.77110958547695]
Inversion bias arises when averaging estimates of quantities that depend on the inverse covariance.
We develop a framework for analyzing inversion bias, based on our proposed concept of an $(epsilon,delta)$-unbiased estimator for random matrices.
We show that when the sketching matrix $S$ is dense and has i.i.d. sub-gaussian entries, the estimator $(epsilon,delta)$-unbiased for $(Atop A)-1$ with a sketch of size $m=O(d+sqrt d/
arXiv Detail & Related papers (2020-11-21T01:33:15Z) - Accelerating Optimization and Reinforcement Learning with
Quasi-Stochastic Approximation [2.294014185517203]
This paper sets out to extend convergence theory to quasi-stochastic approximations.
It is illustrated with applications to gradient-free optimization and policy gradient algorithms for reinforcement learning.
arXiv Detail & Related papers (2020-09-30T04:44:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.