Related papers: Deep Reinforcement Learning for Dynamic Algorithm Configuration: A Case Study on Optimizing OneMax with the (1+($λ$,$λ$))-GA

Deep Reinforcement Learning for Dynamic Algorithm Configuration: A Case Study on Optimizing OneMax with the (1+($λ$,$λ$))-GA

URL: http://arxiv.org/abs/2512.03805v1
Date: Wed, 03 Dec 2025 13:54:41 GMT
Title: Deep Reinforcement Learning for Dynamic Algorithm Configuration: A Case Study on Optimizing OneMax with the (1+($λ$,$λ$))-GA
Authors: Tai Nguyen, Phong Le, André Biedenkapp, Carola Doerr, Nguyen Dang,
Abstract summary: We conduct a systematic analysis of controlling the population size parameter of the (1+($$,$$)-GA on OneMax instances.<n>Our investigation of DDQN and PPO reveals two fundamental challenges that limit their effectiveness in DAC.<n>We introduce an adaptive reward shifting mechanism that leverages reward distribution statistics to enhance DDQN agent exploration.
Score: 3.5485296570255183
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Dynamic Algorithm Configuration (DAC) studies the efficient identification of control policies for parameterized optimization algorithms. Numerous studies have leveraged the robustness of decision-making in Reinforcement Learning (RL) to address the optimization challenges in algorithm configuration. However, applying RL to DAC is challenging and often requires extensive domain expertise. We conduct a comprehensive study of deep-RL algorithms in DAC through a systematic analysis of controlling the population size parameter of the (1+($λ$,$λ$))-GA on OneMax instances. Our investigation of DDQN and PPO reveals two fundamental challenges that limit their effectiveness in DAC: scalability degradation and learning instability. We trace these issues to two primary causes: under-exploration and planning horizon coverage, each of which can be effectively addressed through targeted solutions. To address under-exploration, we introduce an adaptive reward shifting mechanism that leverages reward distribution statistics to enhance DDQN agent exploration, eliminating the need for instance-specific hyperparameter tuning and ensuring consistent effectiveness across different problem scales. In dealing with the planning horizon coverage problem, we demonstrate that undiscounted learning effectively resolves it in DDQN, while PPO faces fundamental variance issues that necessitate alternative algorithmic designs. We further analyze the hyperparameter dependencies of PPO, showing that while hyperparameter optimization enhances learning stability, it consistently falls short in identifying effective policies across various configurations. Finally, we demonstrate that DDQN equipped with our adaptive reward shifting strategy achieves performance comparable to theoretically derived policies with vastly improved sample efficiency, outperforming prior DAC approaches by several orders of magnitude.

Related papers

Multi-Objective Reinforcement Learning for Cognitive Radar Resource Management [13.322245764325125]
We formulate this as a multi-objective optimization problem and employ deep reinforcement learning to find optimal solutions.<n>Our results demonstrate the effectiveness of both algorithms in adapting to various scenarios.<n>This work contributes to the development of more efficient and adaptive cognitive radar systems.
arXiv Detail & Related papers (2025-06-25T21:56:30Z)
Preference Optimization for Combinatorial Optimization Problems [54.87466279363487]
Reinforcement Learning (RL) has emerged as a powerful tool for neural optimization, enabling models learns that solve complex problems without requiring expert knowledge.<n>Despite significant progress, existing RL approaches face challenges such as diminishing reward signals and inefficient exploration in vast action spaces.<n>We propose Preference Optimization, a novel method that transforms quantitative reward signals into qualitative preference signals via statistical comparison modeling.
arXiv Detail & Related papers (2025-05-13T16:47:00Z)
Deep Reinforcement Learning Algorithms for Option Hedging [0.20482269513546458]
We compare the performance of eight Deep Reinforcement Learning (DRL) algorithms in the context of dynamic hedging.<n>MCPG is the only algorithm to outperform the Black-Scholes delta hedge baseline with the allotted computational budget.
arXiv Detail & Related papers (2025-04-07T21:32:14Z)
Deep Reinforcement Learning for Online Optimal Execution Strategies [49.1574468325115]
This paper tackles the challenge of learning non-Markovian optimal execution strategies in dynamic financial markets. We introduce a novel actor-critic algorithm based on Deep Deterministic Policy Gradient (DDPG) We show that our algorithm successfully approximates the optimal execution strategy.
arXiv Detail & Related papers (2024-10-17T12:38:08Z)
An Improved Artificial Fish Swarm Algorithm for Solving the Problem of Investigation Path Planning [8.725702964289479]
We propose a chaotic artificial fish swarm algorithm based on multiple population differential evolution (DE-CAFSA) We incorporate adaptive field of view and step size adjustments, replace random behavior with the 2-opt operation, and introduce chaos theory and sub-optimal solutions. Experimental results demonstrate that DE-CAFSA outperforms other algorithms on various public datasets of different sizes.
arXiv Detail & Related papers (2023-10-20T09:35:51Z)
Online Control of Adaptive Large Neighborhood Search using Deep Reinforcement Learning [4.374837991804085]
We introduce a Deep Reinforcement Learning based approach called DR-ALNS that selects operators, adjusts parameters, and controls the acceptance criterion throughout the search. We evaluate the proposed method on a problem with orienteering weights and time windows, as presented in an IJCAI competition. The results show that our approach outperforms vanilla ALNS, ALNS tuned with Bayesian optimization, and two state-of-the-art DRL approaches.
arXiv Detail & Related papers (2022-11-01T21:33:46Z)
Coverage and Capacity Optimization in STAR-RISs Assisted Networks: A Machine Learning Approach [102.00221938474344]
A novel model is proposed for the coverage and capacity optimization of simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RISs) assisted networks. A loss function-based update strategy is the core point, which is able to calculate weights for both loss functions of coverage and capacity by a min-norm solver at each update. The numerical results demonstrate that the investigated update strategy outperforms the fixed weight-based MO algorithms.
arXiv Detail & Related papers (2022-04-13T13:52:22Z)
False Correlation Reduction for Offline Reinforcement Learning [115.11954432080749]
We propose falSe COrrelation REduction (SCORE) for offline RL, a practically effective and theoretically provable algorithm. We empirically show that SCORE achieves the SoTA performance with 3.1x acceleration on various tasks in a standard benchmark (D4RL)
arXiv Detail & Related papers (2021-10-24T15:34:03Z)
Resource Allocation via Model-Free Deep Learning in Free Space Optical Communications [119.81868223344173]
The paper investigates the general problem of resource allocation for mitigating channel fading effects in Free Space Optical (FSO) communications. Under this framework, we propose two algorithms that solve FSO resource allocation problems.
arXiv Detail & Related papers (2020-07-27T17:38:51Z)
Combining Deep Learning and Optimization for Security-Constrained Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems. Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs. This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z)
Optimization-driven Deep Reinforcement Learning for Robust Beamforming in IRS-assisted Wireless Communications [54.610318402371185]
Intelligent reflecting surface (IRS) is a promising technology to assist downlink information transmissions from a multi-antenna access point (AP) to a receiver. We minimize the AP's transmit power by a joint optimization of the AP's active beamforming and the IRS's passive beamforming. We propose a deep reinforcement learning (DRL) approach that can adapt the beamforming strategies from past experiences.
arXiv Detail & Related papers (2020-05-25T01:42:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.