Related papers: Reinforcement learning based adaptive metaheuristics

Reinforcement learning based adaptive metaheuristics

URL: http://arxiv.org/abs/2206.12233v1
Date: Fri, 24 Jun 2022 12:01:49 GMT
Title: Reinforcement learning based adaptive metaheuristics
Authors: Michele Tessari, Giovanni Iacca
Abstract summary: We introduce a general-purpose framework for performing parameter adaptation in continuous-domain metaheuristics based on state-of-the-art reinforcement learning algorithms. We demonstrate the applicability of this framework on two algorithms, namely Covariance Matrix Adaptation Evolution Strategies (CMA-ES) and Differential Evolution (DE)
Score: 5.254093731341154
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Parameter adaptation, that is the capability to automatically adjust an algorithm's hyperparameters depending on the problem being faced, is one of the main trends in evolutionary computation applied to numerical optimization. While several handcrafted adaptation policies have been proposed over the years to address this problem, only few attempts have been done so far at apply machine learning to learn such policies. Here, we introduce a general-purpose framework for performing parameter adaptation in continuous-domain metaheuristics based on state-of-the-art reinforcement learning algorithms. We demonstrate the applicability of this framework on two algorithms, namely Covariance Matrix Adaptation Evolution Strategies (CMA-ES) and Differential Evolution (DE), for which we learn, respectively, adaptation policies for the step-size (for CMA-ES), and the scale factor and crossover rate (for DE). We train these policies on a set of 46 benchmark functions at different dimensionalities, with various inputs to the policies, in two settings: one policy per function, and one global policy for all functions. Compared, respectively, to the Cumulative Step-size Adaptation (CSA) policy and to two well-known adaptive DE variants (iDE and jDE), our policies are able to produce competitive results in the majority of cases, especially in the case of DE.

Related papers

Learning Deterministic Policies with Policy Gradients in Constrained Markov Decision Processes [59.27926064817273]
We introduce an exploration-agnostic algorithm, called C-PG, which enjoys global last-iterate convergence guarantees under domination assumptions.<n>We empirically validate both the action-based (C-PGAE) and parameter-based (C-PGPE) variants of C-PG on constrained control tasks.
arXiv Detail & Related papers (2025-06-06T10:29:05Z)
Functional Acceleration for Policy Mirror Descent [42.08953240415424]
We apply functional acceleration to the Policy Mirror Descent (PMD) general family of algorithms. By taking the functional route, our approach is independent of the policy parametrization and applicable to large-scale optimization.
arXiv Detail & Related papers (2024-07-23T16:04:55Z)
Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning [62.81324245896717]
We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global last-ite convergence guarantees under (weak) gradient domination assumptions. We numerically validate our algorithms on constrained control problems, and compare them with state-of-the-art baselines.
arXiv Detail & Related papers (2024-07-15T14:54:57Z)
HarmoDT: Harmony Multi-Task Decision Transformer for Offline Reinforcement Learning [72.25707314772254]
We introduce the Harmony Multi-Task Decision Transformer (HarmoDT), a novel solution designed to identify an optimal harmony subspace of parameters for each task. The upper level of this framework is dedicated to learning a task-specific mask that delineates the harmony subspace, while the inner level focuses on updating parameters to enhance the overall performance of the unified policy.
arXiv Detail & Related papers (2024-05-28T11:41:41Z)
Acceleration in Policy Optimization [50.323182853069184]
We work towards a unifying paradigm for accelerating policy optimization methods in reinforcement learning (RL) by integrating foresight in the policy improvement step via optimistic and adaptive updates. We define optimism as predictive modelling of the future behavior of a policy, and adaptivity as taking immediate and anticipatory corrective actions to mitigate errors from overshooting predictions or delayed responses to change. We design an optimistic policy gradient algorithm, adaptive via meta-gradient learning, and empirically highlight several design choices pertaining to acceleration, in an illustrative task.
arXiv Detail & Related papers (2023-06-18T15:50:57Z)
Stepsize Learning for Policy Gradient Methods in Contextual Markov Decision Processes [35.889129338603446]
Policy-based algorithms are among the most widely adopted techniques in model-free RL. They tend to struggle when asked to accomplish a series of heterogeneous tasks. We introduce a new formulation, known as meta-MDP, that can be used to solve any hyper parameter selection problem in RL.
arXiv Detail & Related papers (2023-06-13T12:58:12Z)
High-probability sample complexities for policy evaluation with linear function approximation [88.87036653258977]
We investigate the sample complexities required to guarantee a predefined estimation error of the best linear coefficients for two widely-used policy evaluation algorithms. We establish the first sample complexity bound with high-probability convergence guarantee that attains the optimal dependence on the tolerance level.
arXiv Detail & Related papers (2023-05-30T12:58:39Z)
Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm. We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z)
Anchor-Changing Regularized Natural Policy Gradient for Multi-Objective Reinforcement Learning [17.916366827429034]
We study policy optimization for Markov decision processes (MDPs) with multiple reward value functions. We propose an Anchor-changing Regularized Natural Policy Gradient framework, which can incorporate ideas from well-performing first-order methods.
arXiv Detail & Related papers (2022-06-10T21:09:44Z)
Causal Policy Gradients [6.123324869194195]
Causal policy gradients (CPGs) provide a common framework for analysing key state-of-the-art algorithms. CPGs are shown to generalise traditional policy gradients, and yield a principled way of incorporating prior knowledge of a problem domain's generative processes.
arXiv Detail & Related papers (2021-02-20T14:51:12Z)
Invariant Policy Optimization: Towards Stronger Generalization in Reinforcement Learning [5.476958867922322]
A fundamental challenge in reinforcement learning is to learn policies that generalize beyond the operating domains experienced during training. We propose a novel learning algorithm, Invariant Policy Optimization (IPO), that implements this principle and learns an invariant policy during training.
arXiv Detail & Related papers (2020-06-01T17:28:19Z)
Variational Policy Propagation for Multi-agent Reinforcement Learning [68.26579560607597]
We propose a emphcollaborative multi-agent reinforcement learning algorithm named variational policy propagation (VPP) to learn a emphjoint policy through the interactions over agents. We prove that the joint policy is a Markov Random Field under some mild conditions, which in turn reduces the policy space effectively. We integrate the variational inference as special differentiable layers in policy such as the actions can be efficiently sampled from the Markov Random Field and the overall policy is differentiable.
arXiv Detail & Related papers (2020-04-19T15:42:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.