Related papers: Enabling surrogate-assisted evolutionary reinforcement learning via policy embedding

Enabling surrogate-assisted evolutionary reinforcement learning via policy embedding

URL: http://arxiv.org/abs/2301.13374v1
Date: Tue, 31 Jan 2023 02:36:06 GMT
Title: Enabling surrogate-assisted evolutionary reinforcement learning via policy embedding
Authors: Lan Tang, Xiaxi Li, Jinyuan Zhang, Guiying Li, Peng Yang and Ke Tang
Abstract summary: This paper proposes a PE-SAERL Framework to enable surrogate-assisted evolutionary reinforcement learning via policy embedding. Empirical results on 5 Atari games show that the proposed method can perform more efficiently than the four state-of-the-art algorithms.
Score: 28.272572839321104
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Evolutionary Reinforcement Learning (ERL) that applying Evolutionary Algorithms (EAs) to optimize the weight parameters of Deep Neural Network (DNN) based policies has been widely regarded as an alternative to traditional reinforcement learning methods. However, the evaluation of the iteratively generated population usually requires a large amount of computational time and can be prohibitively expensive, which may potentially restrict the applicability of ERL. Surrogate is often used to reduce the computational burden of evaluation in EAs. Unfortunately, in ERL, each individual of policy usually represents millions of weights parameters of DNN. This high-dimensional representation of policy has introduced a great challenge to the application of surrogates into ERL to speed up training. This paper proposes a PE-SAERL Framework to at the first time enable surrogate-assisted evolutionary reinforcement learning via policy embedding (PE). Empirical results on 5 Atari games show that the proposed method can perform more efficiently than the four state-of-the-art algorithms. The training process is accelerated up to 7x on tested games, comparing to its counterpart without the surrogate and PE.

Related papers

Leveraging Genetic Algorithms for Efficient Demonstration Generation in Real-World Reinforcement Learning Environments [0.8602553195689513]
Reinforcement Learning (RL) has demonstrated significant potential in certain real-world industrial applications.<n>This study investigates the utilization of Genetic Algorithms (GAs) as a mechanism for improving RL performance.<n>We propose a novel approach in which GA-generated expert demonstrations are used to enhance policy learning.
arXiv Detail & Related papers (2025-07-01T14:04:17Z)
On-Policy RL with Optimal Reward Baseline [109.47676554514193]
On-Policy RL with Optimal reward baseline (OPO) is a novel and simplified reinforcement learning algorithm.<n>OPO emphasizes the importance of exact on-policy training, which empirically stabilizes the training process and enhances exploration.<n>Results demonstrate OPO's superior performance and training stability without additional models or regularization terms.
arXiv Detail & Related papers (2025-05-29T15:58:04Z)
Surrogate-Assisted Evolutionary Reinforcement Learning Based on Autoencoder and Hyperbolic Neural Network [17.24129493200616]
This paper proposes a novel surrogate-assisted ERL that integrates Autoencoders (AE) and Hyperbolic Neural Networks (HNN)<n>AE compresses high-dimensional policies into low-dimensional representations while extracting key features as the inputs for the surrogate.<n> experiments on 10 Atari and 4 Mujoco games have verified that the proposed method outperforms previous approaches significantly.
arXiv Detail & Related papers (2025-05-26T02:25:17Z)
Evolutionary Policy Optimization [47.30139909878251]
Current on-policy methods fail to fully leverage the benefits of parallelized environments. EPO is a novel policy gradient algorithm that combines the strengths of EA and policy gradients. We show that EPO significantly improves performance across diverse and challenging environments.
arXiv Detail & Related papers (2025-03-24T18:08:54Z)
Policy Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone [72.17534881026995]
We develop an offline and online fine-tuning approach called policy-agnostic RL (PA-RL) We show the first result that successfully fine-tunes OpenVLA, a 7B generalist robot policy, autonomously with Cal-QL, an online RL fine-tuning algorithm.
arXiv Detail & Related papers (2024-12-09T17:28:03Z)
GPU-Accelerated Rule Evaluation and Evolution [10.60691612679966]
This paper introduces an innovative approach to boost the efficiency and scalability of Evolutionary Rule-based machine Learning (ERL) The method proposed in this paper, AERL (Accelerated ERL) solves this problem in two ways. First, by adopting GPU-optimized rule sets through a tensorized representation within the PyTorch framework, AERL mitigates the bottleneck and accelerates fitness evaluation significantly. Second, AERL takes further advantage of the GPU by fine-tuning the rule coefficients via back-propagation, thereby improving search space exploration.
arXiv Detail & Related papers (2024-06-03T22:24:12Z)
REBEL: Reinforcement Learning via Regressing Relative Rewards [59.68420022466047]
We propose REBEL, a minimalist RL algorithm for the era of generative models. In theory, we prove that fundamental RL algorithms like Natural Policy Gradient can be seen as variants of REBEL. We find that REBEL provides a unified approach to language modeling and image generation with stronger or similar performance as PPO and DPO.
arXiv Detail & Related papers (2024-04-25T17:20:45Z)
Distilling Reinforcement Learning Policies for Interpretable Robot Locomotion: Gradient Boosting Machines and Symbolic Regression [53.33734159983431]
This paper introduces a novel approach to distill neural RL policies into more interpretable forms. We train expert neural network policies using RL and distill them into (i) GBMs, (ii) EBMs, and (iii) symbolic policies.
arXiv Detail & Related papers (2024-03-21T11:54:45Z)
ADAPTER-RL: Adaptation of Any Agent using Reinforcement Learning [0.0]
adapters have proven effective in supervised learning contexts such as natural language processing and computer vision. This paper presents an innovative adaptation strategy that demonstrates enhanced training efficiency and improvement of the base-agent. Our proposed universal approach is not only compatible with pre-trained neural networks but also with rule-based agents, offering a means to integrate human expertise.
arXiv Detail & Related papers (2023-11-20T04:54:51Z)
Statistically Efficient Variance Reduction with Double Policy Estimation for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning [53.97273491846883]
We propose DPE: an RL algorithm that blends offline sequence modeling and offline reinforcement learning with Double Policy Estimation. We validate our method in multiple tasks of OpenAI Gym with D4RL benchmarks.
arXiv Detail & Related papers (2023-08-28T20:46:07Z)
Reinforcement Learning-assisted Evolutionary Algorithm: A Survey and Research Opportunities [63.258517066104446]
Reinforcement learning integrated as a component in the evolutionary algorithm has demonstrated superior performance in recent years. We discuss the RL-EA integration method, the RL-assisted strategy adopted by RL-EA, and its applications according to the existing literature. In the applications of RL-EA section, we also demonstrate the excellent performance of RL-EA on several benchmarks and a range of public datasets.
arXiv Detail & Related papers (2023-08-25T15:06:05Z)
A reinforcement learning strategy for p-adaptation in high order solvers [0.0]
Reinforcement learning (RL) has emerged as a promising approach to automating decision processes. This paper explores the application of RL techniques to optimise the order in the computational mesh when using high-order solvers.
arXiv Detail & Related papers (2023-06-14T07:01:31Z)
Provable Reward-Agnostic Preference-Based Reinforcement Learning [61.39541986848391]
Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories. We propose a theoretical reward-agnostic PbRL framework where exploratory trajectories that enable accurate learning of hidden reward functions are acquired.
arXiv Detail & Related papers (2023-05-29T15:00:09Z)
Supplementing Gradient-Based Reinforcement Learning with Simple Evolutionary Ideas [4.873362301533824]
We present a simple, sample-efficient algorithm for introducing large but directed learning steps in reinforcement learning (RL) The methodology uses a population of RL agents training with a common experience buffer, with occasional crossovers and mutations of the agents in order to search efficiently through the policy space.
arXiv Detail & Related papers (2023-05-10T09:46:53Z)
Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy. In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks. We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z)
Deep Networks with Fast Retraining [0.0]
This paper proposes a novel MP inverse-based fast retraining strategy for deep convolutional neural network (DCNN) learning. In each training, a random learning strategy that controls the number of convolutional layers trained in the backward pass is first utilized. Then, an MP inverse-based batch-by-batch learning strategy, which enables the network to be implemented without access to industrial-scale computational resources, is developed.
arXiv Detail & Related papers (2020-08-13T15:17:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.