Related papers: Multi-parameter Control for the $(1+(λ,λ))$-GA on OneMax via Deep Reinforcement Learning

Multi-parameter Control for the $(1+(λ,λ))$-GA on OneMax via Deep Reinforcement Learning

URL: http://arxiv.org/abs/2505.12982v2
Date: Wed, 09 Jul 2025 10:18:09 GMT
Title: Multi-parameter Control for the $(1+(λ,λ))$-GA on OneMax via Deep Reinforcement Learning
Authors: Tai Nguyen, Phong Le, Carola Doerr, Nguyen Dang,
Abstract summary: We show how well state-of-the-art deep reinforcement learning techniques can approximate good control policies.<n>We derive a simple control policy that consistently outperforms the default theory-recommended setting.
Score: 4.482691140663255
License: http://creativecommons.org/licenses/by/4.0/
Abstract: It is well known that evolutionary algorithms can benefit from dynamic choices of the key parameters that control their behavior, to adjust their search strategy to the different stages of the optimization process. A prominent example where dynamic parameter choices have shown a provable super-constant speed-up is the $(1+(\lambda,\lambda))$ Genetic Algorithm optimizing the OneMax function. While optimal parameter control policies result in linear expected running times, this is not possible with static parameter choices. This result has spurred a lot of interest in parameter control policies. However, many works, in particular theoretical running time analyses, focus on controlling one single parameter. Deriving policies for controlling multiple parameters remains very challenging. In this work we reconsider the problem of the $(1+(\lambda,\lambda))$ Genetic Algorithm optimizing OneMax. We decouple its four main parameters and investigate how well state-of-the-art deep reinforcement learning techniques can approximate good control policies. We show that although making deep reinforcement learning learn effectively is a challenging task, once it works, it is very powerful and is able to find policies that outperform all previously known control policies on the same benchmark. Based on the results found through reinforcement learning, we derive a simple control policy that consistently outperforms the default theory-recommended setting by $27\%$ and the irace-tuned policy, the strongest existing control policy on this benchmark, by $13\%$, for all tested problem sizes up to $40{,}000$.

Related papers

Enhancing Parameter Control Policies with State Information [0.44241702149260353]
We propose four new benchmarks for which we derive optimal or close-to-optimal control policies.<n>We consider how additional information about the current state of the algorithm can help to make better choices of parameters.
arXiv Detail & Related papers (2025-07-11T07:31:48Z)
Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems. In common practice, convergence (hyper)policies are learned only to deploy their deterministic version. We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z)
Low-Switching Policy Gradient with Exploration via Online Sensitivity Sampling [23.989009116398208]
We design a low-switching sample-efficient policy optimization algorithm, LPO, with general non-linear function approximation. We show that, our algorithm obtains an $varepsilon$-optimal policy with only $widetildeO(fractextpoly(d)varepsilon3)$ samples.
arXiv Detail & Related papers (2023-06-15T23:51:46Z)
Efficient Policy Iteration for Robust Markov Decision Processes via Regularization [49.05403412954533]
Robust decision processes (MDPs) provide a framework to model decision problems where the system dynamics are changing or only partially known. Recent work established the equivalence between texttts rectangular $L_p$ robust MDPs and regularized MDPs, and derived a regularized policy iteration scheme that enjoys the same level of efficiency as standard MDPs. In this work, we focus on the policy improvement step and derive concrete forms for the greedy policy and the optimal robust Bellman operators.
arXiv Detail & Related papers (2022-05-28T04:05:20Z)
Theory-inspired Parameter Control Benchmarks for Dynamic Algorithm Configuration [32.055812915031666]
We show how to compute optimal parameter portfolios of a given size. We extend this benchmark by analyzing optimal control policies that can select the parameters only from a given portfolio of possible values. We demonstrate the usefulness of our benchmarks by analyzing the behavior of the DDQN reinforcement learning approach for dynamic algorithm configuration.
arXiv Detail & Related papers (2022-02-07T15:00:30Z)
Policy Search for Model Predictive Control with Application to Agile Drone Flight [56.24908013905407]
We propose a policy-search-for-model-predictive-control framework for MPC. Specifically, we formulate the MPC as a parameterized controller, where the hard-to-optimize decision variables are represented as high-level policies. Experiments show that our controller achieves robust and real-time control performance in both simulation and the real world.
arXiv Detail & Related papers (2021-12-07T17:39:24Z)
Understanding the Effect of Stochasticity in Policy Optimization [86.7574122154668]
We show that the preferability of optimization methods depends critically on whether exact gradients are used. Second, to explain these findings we introduce the concept of committal rate for policy optimization. Third, we show that in the absence of external oracle information, there is an inherent trade-off between exploiting geometry to accelerate convergence versus achieving optimality almost surely.
arXiv Detail & Related papers (2021-10-29T06:35:44Z)
Optimistic Policy Optimization is Provably Efficient in Non-stationary MDPs [113.8752163061151]
We study episodic reinforcement learning (RL) in non-stationary linear kernel Markov decision processes (MDPs)<n>We propose the underlineperiodically underlinerestarted underlineoptimistic underlinepolicy underlineoptimization algorithm (PROPO)<n>PROPO features two mechanisms: sliding-window-based policy evaluation and periodic-restart-based policy improvement.
arXiv Detail & Related papers (2021-10-18T02:33:20Z)
Lazy Parameter Tuning and Control: Choosing All Parameters Randomly From a Power-Law Distribution [8.34061303235504]
Most evolutionary algorithms have multiple parameters and their values drastically affect the performance. We propose a lazy but effective solution, namely choosing all parameter values in each iteration randomly from a suitably scaled power-law distribution. We prove a performance guarantee that is comparable to the best performance known for static parameters.
arXiv Detail & Related papers (2021-04-14T09:17:18Z)
Self-Adjusting Population Sizes for Non-Elitist Evolutionary Algorithms: Why Success Rates Matter [0.0]
Evolutionary algorithms (EAs) are general-purpose optimisers that come with several parameters like the sizes of parent and offspring populations or the mutation rate. Recent theoretical studies have shown that self-adjusting parameter control mechanisms can outperform the best static parameters in EAs on discrete problems.
arXiv Detail & Related papers (2021-04-12T16:44:54Z)
Hybridizing the 1/5-th Success Rule with Q-Learning for Controlling the Mutation Rate of an Evolutionary Algorithm [0.9281671380673306]
We introduce in this work a new hybrid parameter control technique, which combines the well-known one-fifth success rule with Q-learning. We demonstrate that our HQL mechanism achieves equal or superior performance to all techniques tested in [Rodionova et al., GECCO'19] and this -- in contrast to previous parameter control methods -- simultaneously for all offspring population sizes $lambda$.
arXiv Detail & Related papers (2020-06-19T09:12:49Z)
Provably Efficient Exploration in Policy Optimization [117.09887790160406]
This paper proposes an Optimistic variant of the Proximal Policy Optimization algorithm (OPPO) OPPO achieves $tildeO(sqrtd2 H3 T )$ regret. To the best of our knowledge, OPPO is the first provably efficient policy optimization algorithm that explores.
arXiv Detail & Related papers (2019-12-12T08:40:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.