Related papers: How the Move Acceptance Hyper-Heuristic Copes With Local Optima: Drastic Differences Between Jumps and Cliffs

How the Move Acceptance Hyper-Heuristic Copes With Local Optima: Drastic Differences Between Jumps and Cliffs

URL: http://arxiv.org/abs/2304.10414v1
Date: Thu, 20 Apr 2023 15:57:33 GMT
Title: How the Move Acceptance Hyper-Heuristic Copes With Local Optima: Drastic Differences Between Jumps and Cliffs
Authors: Benjamin Doerr, Arthur Dremaux, Johannes Lutzeyer, and Aur\'elien Stumpf
Abstract summary: We show that the Move Acceptance Hyper-Heuristic (MAHH) leaves the local optimum of the multimodal cliff benchmark with remarkable efficiency. We also show that the MAHH with the global bit-wise mutation operator instead of the local one-bit operator optimize jump functions in time. This suggests that combining several ways to cope with local optima can be a fruitful approach.
Score: 6.793248433673384
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In recent work, Lissovoi, Oliveto, and Warwicker (Artificial Intelligence (2023)) proved that the Move Acceptance Hyper-Heuristic (MAHH) leaves the local optimum of the multimodal cliff benchmark with remarkable efficiency. With its $O(n^3)$ runtime, for almost all cliff widths $d,$ the MAHH massively outperforms the $\Theta(n^d)$ runtime of simple elitist evolutionary algorithms (EAs). For the most prominent multimodal benchmark, the jump functions, the given runtime estimates of $O(n^{2m} m^{-\Theta(m)})$ and $\Omega(2^{\Omega(m)})$, for gap size $m \ge 2$, are far apart and the real performance of MAHH is still an open question. In this work, we resolve this question. We prove that for any choice of the MAHH selection parameter~$p$, the expected runtime of the MAHH on a jump function with gap size $m = o(n^{1/2})$ is at least $\Omega(n^{2m-1} / (2m-1)!)$. This renders the MAHH much slower than simple elitist evolutionary algorithms with their typical $O(n^m)$ runtime. We also show that the MAHH with the global bit-wise mutation operator instead of the local one-bit operator optimizes jump functions in time $O(\min\{m n^m,\frac{n^{2m-1}}{m!\Omega(m)^{m-2}}\})$, essentially the minimum of the optimization times of the $(1+1)$ EA and the MAHH. This suggests that combining several ways to cope with local optima can be a fruitful approach.

Related papers

Hyper-Heuristics Can Profit From Global Variation Operators [12.774575491521926]
We show that the Move Acceptance Hyper-Heuristic (MAHH) leaves the local optimum of the multimodal CLIFF benchmark with remarkable efficiency. We also show that replacing the local one-bit mutation operator in the MAHH with the global bit-wise mutation operator, commonly used in EAs, yields a runtime of $min1, O(fraceln(n)m)m, O(nm)$ on JUMP functions.
arXiv Detail & Related papers (2024-07-19T12:10:05Z)
Proven Runtime Guarantees for How the MOEA/D Computes the Pareto Front From the Subproblem Solutions [9.044970217182117]
decomposition-based multi-objective evolutionary algorithm (MOEA/D) does not directly optimize a given multi-objective function $f$, but instead optimize $N + 1$ single-objective subproblems of $f$ in a co-evolutionary manner. We analyze for the first time how the MOEA/D with only standard mutation operators computes the whole Pareto front of the OneMinMax benchmark. Our overall bound for power-law suggests that the MOEA/D performs best for $N = O(nbeta - 1)$, resulting in an $O(n
arXiv Detail & Related papers (2024-05-02T05:32:19Z)
An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue Optimization Problems [76.2042837251496]
We introduce two oblivious mirror descent algorithms based on a complementary composite setting. Remarkably, both algorithms work without prior knowledge of the Lipschitz constant or smoothness of the objective function. We show how to extend our framework to scale and demonstrate the efficiency and robustness of our methods on large scale semidefinite programs.
arXiv Detail & Related papers (2023-06-30T08:34:29Z)
Reward-Mixing MDPs with a Few Latent Contexts are Learnable [75.17357040707347]
We consider episodic reinforcement learning in reward-mixing Markov decision processes (RMMDPs) Our goal is to learn a near-optimal policy that nearly maximizes the $H$ time-step cumulative rewards in such a model.
arXiv Detail & Related papers (2022-10-05T22:52:00Z)
TURF: A Two-factor, Universal, Robust, Fast Distribution Learning Algorithm [64.13217062232874]
One of its most powerful and successful modalities approximates every distribution to an $ell$ distance essentially at most a constant times larger than its closest $t$-piece degree-$d_$. We provide a method that estimates this number near-optimally, hence helps approach the best possible approximation.
arXiv Detail & Related papers (2022-02-15T03:49:28Z)
When Non-Elitism Meets Time-Linkage Problems [19.798298260257432]
We analyze on the influence of the non-elitism via comparing the performance of the elitist (1+$lambda$)EA and its non-elitist counterpart (1,$lambda$)EA. We prove that with probability $1$, (1,$lambda$)EA can reach the global optimum and its expected runtime is $O(n3+clog n)$ with $lambda=c log_fracee-1 n$ for the constant $cge 1$.
arXiv Detail & Related papers (2021-04-14T13:03:42Z)
Provably Breaking the Quadratic Error Compounding Barrier in Imitation Learning, Optimally [58.463668865380946]
We study the statistical limits of Imitation Learning in episodic Markov Decision Processes (MDPs) with a state space $mathcalS$. We establish an upper bound $O(|mathcalS|H3/2/N)$ for the suboptimality using the Mimic-MD algorithm in Rajaraman et al ( 2020) We show the minimax suboptimality grows as $Omega( H3/2/N)$ when $mathcalS|geq 3$ while the unknown-transition setting suffers from a larger sharp rate
arXiv Detail & Related papers (2021-02-25T15:50:19Z)
Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes [91.38793800392108]
We study reinforcement learning with linear function approximation where the underlying transition probability kernel of the Markov decision process (MDP) is a linear mixture model. We propose a new, computationally efficient algorithm with linear function approximation named $textUCRL-VTR+$ for the aforementioned linear mixture MDPs. To the best of our knowledge, these are the first computationally efficient, nearly minimax optimal algorithms for RL with linear function approximation.
arXiv Detail & Related papers (2020-12-15T18:56:46Z)
Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model. We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z)
A Rigorous Runtime Analysis of the $(1 + (\lambda, \lambda))$ GA on Jump Functions [8.34061303235504]
We conduct the first runtime analysis of this algorithm on a multimodal problem class, the jump functions benchmark. For the isolated problem of leaving the local optimum of jump functions, we determine provably optimal parameters that lead to a runtime of $(n/k)k/2 eTheta(k)$.
arXiv Detail & Related papers (2020-04-14T17:54:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.