Convergence Rate of the (1+1)-Evolution Strategy with Success-Based
Step-Size Adaptation on Convex Quadratic Functions
- URL: http://arxiv.org/abs/2103.01578v2
- Date: Mon, 12 Apr 2021 14:16:38 GMT
- Title: Convergence Rate of the (1+1)-Evolution Strategy with Success-Based
Step-Size Adaptation on Convex Quadratic Functions
- Authors: Daiki Morinaga, Kazuto Fukuchi, Jun Sakuma, and Youhei Akimoto
- Abstract summary: The (1+1)-evolution strategy (ES) with success-based step-size adaptation is analyzed on a general convex quadratic function.
The convergence rate of the (1+1)-ES is derived explicitly and rigorously on a general convex quadratic function.
- Score: 20.666734673282498
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The (1+1)-evolution strategy (ES) with success-based step-size adaptation is
analyzed on a general convex quadratic function and its monotone
transformation, that is, $f(x) = g((x - x^*)^\mathrm{T} H (x - x^*))$, where
$g:\mathbb{R}\to\mathbb{R}$ is a strictly increasing function, $H$ is a
positive-definite symmetric matrix, and $x^* \in \mathbb{R}^d$ is the optimal
solution of $f$. The convergence rate, that is, the decrease rate of the
distance from a search point $m_t$ to the optimal solution $x^*$, is proven to
be in $O(\exp( - L / \mathrm{Tr}(H) ))$, where $L$ is the smallest eigenvalue
of $H$ and $\mathrm{Tr}(H)$ is the trace of $H$. This result generalizes the
known rate of $O(\exp(- 1/d ))$ for the case of $H = I_{d}$ ($I_d$ is the
identity matrix of dimension $d$) and $O(\exp(- 1/ (d\cdot\xi) ))$ for the case
of $H = \mathrm{diag}(\xi \cdot I_{d/2}, I_{d/2})$. To the best of our
knowledge, this is the first study in which the convergence rate of the
(1+1)-ES is derived explicitly and rigorously on a general convex quadratic
function, which depicts the impact of the distribution of the eigenvalues in
the Hessian $H$ on the optimization and not only the impact of the condition
number of $H$.
Related papers
- Sample and Computationally Efficient Robust Learning of Gaussian Single-Index Models [37.42736399673992]
A single-index model (SIM) is a function of the form $sigma(mathbfwast cdot mathbfx)$, where $sigma: mathbbR to mathbbR$ is a known link function and $mathbfwast$ is a hidden unit vector.
We show that a proper learner attains $L2$-error of $O(mathrmOPT)+epsilon$, where $
arXiv Detail & Related papers (2024-11-08T17:10:38Z) - Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
We study the problem of residual error estimation for matrix and vector norms using a linear sketch.
We demonstrate that this gives a substantial advantage empirically, for roughly the same sketch size and accuracy as in previous work.
We also show an $Omega(k2/pn1-2/p)$ lower bound for the sparse recovery problem, which is tight up to a $mathrmpoly(log n)$ factor.
arXiv Detail & Related papers (2024-08-16T02:33:07Z) - On the $O(\frac{\sqrt{d}}{T^{1/4}})$ Convergence Rate of RMSProp and Its Momentum Extension Measured by $\ell_1$ Norm [59.65871549878937]
This paper considers the RMSProp and its momentum extension and establishes the convergence rate of $frac1Tsum_k=1T.
Our convergence rate matches the lower bound with respect to all the coefficients except the dimension $d$.
Our convergence rate can be considered to be analogous to the $frac1Tsum_k=1T.
arXiv Detail & Related papers (2024-02-01T07:21:32Z) - Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix
Factorization [54.29685789885059]
We introduce efficient $(1+varepsilon)$-approximation algorithms for the binary matrix factorization (BMF) problem.
The goal is to approximate $mathbfA$ as a product of low-rank factors.
Our techniques generalize to other common variants of the BMF problem.
arXiv Detail & Related papers (2023-06-02T18:55:27Z) - Convergence of a Normal Map-based Prox-SGD Method under the KL
Inequality [0.0]
We present a novel map-based algorithm ($mathsfnorMtext-mathsfSGD$) for $symbol$k$ convergence problems.
arXiv Detail & Related papers (2023-05-10T01:12:11Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Unique Games hardness of Quantum Max-Cut, and a conjectured
vector-valued Borell's inequality [6.621324975749854]
We show that the noise stability of a function $f:mathbbRn to -1, 1$ is the expected value of $f(boldsymbolx) cdot f(boldsymboly)$.
We conjecture that the expected value of $langle f(boldsymbolx), f(boldsymboly)rangle$ is minimized by the function $f(x) = x_leq k / Vert x_leq k /
arXiv Detail & Related papers (2021-11-01T20:45:42Z) - Random matrices in service of ML footprint: ternary random features with
no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$.
We propose a novel random technique, called Ternary Random Feature (TRF)
The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z) - Spectral properties of sample covariance matrices arising from random
matrices with independent non identically distributed columns [50.053491972003656]
It was previously shown that the functionals $texttr(AR(z))$, for $R(z) = (frac1nXXT- zI_p)-1$ and $Ain mathcal M_p$ deterministic, have a standard deviation of order $O(|A|_* / sqrt n)$.
Here, we show that $|mathbb E[R(z)] - tilde R(z)|_F
arXiv Detail & Related papers (2021-09-06T14:21:43Z) - Accelerating Optimization and Reinforcement Learning with
Quasi-Stochastic Approximation [2.294014185517203]
This paper sets out to extend convergence theory to quasi-stochastic approximations.
It is illustrated with applications to gradient-free optimization and policy gradient algorithms for reinforcement learning.
arXiv Detail & Related papers (2020-09-30T04:44:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.