Speeding Up Hyper-Heuristics With Markov-Chain Operator Selection and the Only-Worsening Acceptance Operator
- URL: http://arxiv.org/abs/2506.01107v1
- Date: Sun, 01 Jun 2025 18:16:06 GMT
- Title: Speeding Up Hyper-Heuristics With Markov-Chain Operator Selection and the Only-Worsening Acceptance Operator
- Authors: Abderrahim Bendahi, Benjamin Doerr, Adrien Fradin, Johannes F. Lutzeyer,
- Abstract summary: We show impressive performances on a large class of benchmarks including the classic Cliff$_d$ and Jump$_m$ function classes.<n>We replace the all-moves acceptance operator with the operator that only accepts worsenings.<n>We prove a remarkably good runtime of $O(nk+1 log n)$ for our Markov move-acceptance hyper-heuristic.
- Score: 10.404614698222058
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The move-acceptance hyper-heuristic was recently shown to be able to leave local optima with astonishing efficiency (Lissovoi et al., Artificial Intelligence (2023)). In this work, we propose two modifications to this algorithm that demonstrate impressive performances on a large class of benchmarks including the classic Cliff$_d$ and Jump$_m$ function classes. (i) Instead of randomly choosing between the only-improving and any-move acceptance operator, we take this choice via a simple two-state Markov chain. This modification alone reduces the runtime on Jump$_m$ functions with gap parameter $m$ from $\Omega(n^{2m-1})$ to $O(n^{m+1})$. (ii) We then replace the all-moves acceptance operator with the operator that only accepts worsenings. Such a, counter-intuitive, operator has not been used before in the literature. However, our proofs show that our only-worsening operator can greatly help in leaving local optima, reducing, e.g., the runtime on Jump functions to $O(n^3 \log n)$ independent of the gap size. In general, we prove a remarkably good runtime of $O(n^{k+1} \log n)$ for our Markov move-acceptance hyper-heuristic on all members of a new benchmark class SEQOPT$_k$, which contains a large number of functions having $k$ successive local optima, and which contains the commonly studied Jump$_m$ and Cliff$_d$ functions for $k=2$.
Related papers
- Near-Optimal Clustering in Mixture of Markov Chains [74.3828414695655]
We study the problem of clustering $T$ trajectories of length $H$, each generated by one of $K$ unknown ergodic Markov chains over a finite state space of size $S$.<n>We derive an instance-dependent, high-probability lower bound on the clustering error rate, governed by the weighted KL divergence between the transition kernels of the chains.<n>We then present a novel two-stage clustering algorithm.
arXiv Detail & Related papers (2025-06-02T05:10:40Z) - Sign Operator for Coping with Heavy-Tailed Noise in Non-Convex Optimization: High Probability Bounds Under $(L_0, L_1)$-Smoothness [74.18546828528298]
We show that SignSGD with Majority Voting can robustly work on the whole range of complexity with $kappakappakappakappa-1right, kappakappakappa-1right, kappakappakappa-1right, kappakappakappa-1right, kappakappakappa-1right, kappakappakappa-1right, kappakappakappa-1right, kappa
arXiv Detail & Related papers (2025-02-11T19:54:11Z) - Hyper-Heuristics Can Profit From Global Variation Operators [12.774575491521926]
We show that the Move Acceptance Hyper-Heuristic (MAHH) leaves the local optimum of the multimodal CLIFF benchmark with remarkable efficiency.
We also show that replacing the local one-bit mutation operator in the MAHH with the global bit-wise mutation operator, commonly used in EAs, yields a runtime of $min1, O(fraceln(n)m)m, O(nm)$ on JUMP functions.
arXiv Detail & Related papers (2024-07-19T12:10:05Z) - Proven Runtime Guarantees for How the MOEA/D Computes the Pareto Front From the Subproblem Solutions [9.044970217182117]
decomposition-based multi-objective evolutionary algorithm (MOEA/D) does not directly optimize a given multi-objective function $f$, but instead optimize $N + 1$ single-objective subproblems of $f$ in a co-evolutionary manner.
We analyze for the first time how the MOEA/D with only standard mutation operators computes the whole Pareto front of the OneMinMax benchmark.
Our overall bound for power-law suggests that the MOEA/D performs best for $N = O(nbeta - 1)$, resulting in an $O(n
arXiv Detail & Related papers (2024-05-02T05:32:19Z) - How the Move Acceptance Hyper-Heuristic Copes With Local Optima: Drastic
Differences Between Jumps and Cliffs [6.793248433673384]
We show that the Move Acceptance Hyper-Heuristic (MAHH) leaves the local optimum of the multimodal cliff benchmark with remarkable efficiency.
We also show that the MAHH with the global bit-wise mutation operator instead of the local one-bit operator optimize jump functions in time.
This suggests that combining several ways to cope with local optima can be a fruitful approach.
arXiv Detail & Related papers (2023-04-20T15:57:33Z) - Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$
Geometry [69.24618367447101]
Up to logarithmic factors the optimal excess population loss of any $(varepsilon,delta)$-differently private is $sqrtlog(d)/n + sqrtd/varepsilon n.$
We show that when the loss functions satisfy additional smoothness assumptions, the excess loss is upper bounded (up to logarithmic factors) by $sqrtlog(d)/n + (log(d)/varepsilon n)2/3.
arXiv Detail & Related papers (2021-03-02T06:53:44Z) - Provably Breaking the Quadratic Error Compounding Barrier in Imitation
Learning, Optimally [58.463668865380946]
We study the statistical limits of Imitation Learning in episodic Markov Decision Processes (MDPs) with a state space $mathcalS$.
We establish an upper bound $O(|mathcalS|H3/2/N)$ for the suboptimality using the Mimic-MD algorithm in Rajaraman et al ( 2020)
We show the minimax suboptimality grows as $Omega( H3/2/N)$ when $mathcalS|geq 3$ while the unknown-transition setting suffers from a larger sharp rate
arXiv Detail & Related papers (2021-02-25T15:50:19Z) - Thresholded Lasso Bandit [70.17389393497125]
Thresholded Lasso bandit is an algorithm that estimates the vector defining the reward function as well as its sparse support.
We establish non-asymptotic regret upper bounds scaling as $mathcalO( log d + sqrtT )$ in general, and as $mathcalO( log d + sqrtT )$ under the so-called margin condition.
arXiv Detail & Related papers (2020-10-22T19:14:37Z) - Accelerated Zeroth-Order and First-Order Momentum Methods from Mini to
Minimax Optimization [133.53164856723782]
We propose a new accelerated zeroth-order momentum (AccZOM) method for black-box mini-optimization where only function values can be obtained.
Meanwhile, we propose an accelerated zeroth-order momentum descent (Acc-MDA) method for black-box minimax optimization, where only function values can be obtained.
In particular, our Acc-MDA can obtain a lower gradient complexity of $tildeO(kappa_y2.5epsilon-3)$ with a batch size $O(kappa_y4)$.
arXiv Detail & Related papers (2020-08-18T22:19:29Z) - Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model.
We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z) - Runtime Analysis of a Heavy-Tailed $(1+(\lambda,\lambda))$ Genetic
Algorithm on Jump Functions [9.853329403413701]
The jump function can be optimized with jump size $k$ in an expected runtime of only $n(k + 1)/2k-k/2eO(k)$ fitness evaluations.
We show that this algorithm with natural instance-independent choices of the distribution parameters on all jump functions with jump size at most $n/4$ has a performance close to what the best instance-specific parameters in the previous work obtained.
arXiv Detail & Related papers (2020-06-05T15:57:55Z) - A Rigorous Runtime Analysis of the $(1 + (\lambda, \lambda))$ GA on Jump
Functions [8.34061303235504]
We conduct the first runtime analysis of this algorithm on a multimodal problem class, the jump functions benchmark.
For the isolated problem of leaving the local optimum of jump functions, we determine provably optimal parameters that lead to a runtime of $(n/k)k/2 eTheta(k)$.
arXiv Detail & Related papers (2020-04-14T17:54:12Z) - On the Modularity of Hypernetworks [103.1147622394852]
We show that for a structured target function, the overall number of trainable parameters in a hypernetwork is smaller by orders of magnitude than the number of trainable parameters of a standard neural network and an embedding method.
arXiv Detail & Related papers (2020-02-23T22:51:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.