Reinforcement learning based adaptive metaheuristics
- URL: http://arxiv.org/abs/2206.12233v1
- Date: Fri, 24 Jun 2022 12:01:49 GMT
- Title: Reinforcement learning based adaptive metaheuristics
- Authors: Michele Tessari, Giovanni Iacca
- Abstract summary: We introduce a general-purpose framework for performing parameter adaptation in continuous-domain metaheuristics based on state-of-the-art reinforcement learning algorithms.
We demonstrate the applicability of this framework on two algorithms, namely Covariance Matrix Adaptation Evolution Strategies (CMA-ES) and Differential Evolution (DE)
- Score: 5.254093731341154
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Parameter adaptation, that is the capability to automatically adjust an
algorithm's hyperparameters depending on the problem being faced, is one of the
main trends in evolutionary computation applied to numerical optimization.
While several handcrafted adaptation policies have been proposed over the years
to address this problem, only few attempts have been done so far at apply
machine learning to learn such policies. Here, we introduce a general-purpose
framework for performing parameter adaptation in continuous-domain
metaheuristics based on state-of-the-art reinforcement learning algorithms. We
demonstrate the applicability of this framework on two algorithms, namely
Covariance Matrix Adaptation Evolution Strategies (CMA-ES) and Differential
Evolution (DE), for which we learn, respectively, adaptation policies for the
step-size (for CMA-ES), and the scale factor and crossover rate (for DE). We
train these policies on a set of 46 benchmark functions at different
dimensionalities, with various inputs to the policies, in two settings: one
policy per function, and one global policy for all functions. Compared,
respectively, to the Cumulative Step-size Adaptation (CSA) policy and to two
well-known adaptive DE variants (iDE and jDE), our policies are able to produce
competitive results in the majority of cases, especially in the case of DE.
Related papers
- Functional Acceleration for Policy Mirror Descent [42.08953240415424]
We apply functional acceleration to the Policy Mirror Descent (PMD) general family of algorithms.
By taking the functional route, our approach is independent of the policy parametrization and applicable to large-scale optimization.
arXiv Detail & Related papers (2024-07-23T16:04:55Z) - Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning [62.81324245896717]
We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global last-ite convergence guarantees under (weak) gradient domination assumptions.
We numerically validate our algorithms on constrained control problems, and compare them with state-of-the-art baselines.
arXiv Detail & Related papers (2024-07-15T14:54:57Z) - HarmoDT: Harmony Multi-Task Decision Transformer for Offline Reinforcement Learning [72.25707314772254]
We introduce the Harmony Multi-Task Decision Transformer (HarmoDT), a novel solution designed to identify an optimal harmony subspace of parameters for each task.
The upper level of this framework is dedicated to learning a task-specific mask that delineates the harmony subspace, while the inner level focuses on updating parameters to enhance the overall performance of the unified policy.
arXiv Detail & Related papers (2024-05-28T11:41:41Z) - Acceleration in Policy Optimization [50.323182853069184]
We work towards a unifying paradigm for accelerating policy optimization methods in reinforcement learning (RL) by integrating foresight in the policy improvement step via optimistic and adaptive updates.
We define optimism as predictive modelling of the future behavior of a policy, and adaptivity as taking immediate and anticipatory corrective actions to mitigate errors from overshooting predictions or delayed responses to change.
We design an optimistic policy gradient algorithm, adaptive via meta-gradient learning, and empirically highlight several design choices pertaining to acceleration, in an illustrative task.
arXiv Detail & Related papers (2023-06-18T15:50:57Z) - Stepsize Learning for Policy Gradient Methods in Contextual Markov
Decision Processes [35.889129338603446]
Policy-based algorithms are among the most widely adopted techniques in model-free RL.
They tend to struggle when asked to accomplish a series of heterogeneous tasks.
We introduce a new formulation, known as meta-MDP, that can be used to solve any hyper parameter selection problem in RL.
arXiv Detail & Related papers (2023-06-13T12:58:12Z) - High-probability sample complexities for policy evaluation with linear function approximation [88.87036653258977]
We investigate the sample complexities required to guarantee a predefined estimation error of the best linear coefficients for two widely-used policy evaluation algorithms.
We establish the first sample complexity bound with high-probability convergence guarantee that attains the optimal dependence on the tolerance level.
arXiv Detail & Related papers (2023-05-30T12:58:39Z) - Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm.
We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z) - Anchor-Changing Regularized Natural Policy Gradient for Multi-Objective
Reinforcement Learning [17.916366827429034]
We study policy optimization for Markov decision processes (MDPs) with multiple reward value functions.
We propose an Anchor-changing Regularized Natural Policy Gradient framework, which can incorporate ideas from well-performing first-order methods.
arXiv Detail & Related papers (2022-06-10T21:09:44Z) - Causal Policy Gradients [6.123324869194195]
Causal policy gradients (CPGs) provide a common framework for analysing key state-of-the-art algorithms.
CPGs are shown to generalise traditional policy gradients, and yield a principled way of incorporating prior knowledge of a problem domain's generative processes.
arXiv Detail & Related papers (2021-02-20T14:51:12Z) - Invariant Policy Optimization: Towards Stronger Generalization in
Reinforcement Learning [5.476958867922322]
A fundamental challenge in reinforcement learning is to learn policies that generalize beyond the operating domains experienced during training.
We propose a novel learning algorithm, Invariant Policy Optimization (IPO), that implements this principle and learns an invariant policy during training.
arXiv Detail & Related papers (2020-06-01T17:28:19Z) - Variational Policy Propagation for Multi-agent Reinforcement Learning [68.26579560607597]
We propose a emphcollaborative multi-agent reinforcement learning algorithm named variational policy propagation (VPP) to learn a emphjoint policy through the interactions over agents.
We prove that the joint policy is a Markov Random Field under some mild conditions, which in turn reduces the policy space effectively.
We integrate the variational inference as special differentiable layers in policy such as the actions can be efficiently sampled from the Markov Random Field and the overall policy is differentiable.
arXiv Detail & Related papers (2020-04-19T15:42:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.