Evolutionary Policy Optimization
- URL: http://arxiv.org/abs/2504.12568v1
- Date: Thu, 17 Apr 2025 01:33:06 GMT
- Title: Evolutionary Policy Optimization
- Authors: Zelal Su "Lain" Mustafaoglu, Keshav Pingali, Risto Miikkulainen,
- Abstract summary: A key challenge in reinforcement learning is managing the exploration-exploitation trade-off without sacrificing sample efficiency.<n>This paper proposes Evolutionary Policy Optimization (EPO), a hybrid algorithm that integrates neuroevolution with policy gradient methods for policy optimization.<n> Experimental results show that EPO improves both policy quality and sample efficiency compared to standard PG and EC methods.
- Score: 9.519528646219054
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A key challenge in reinforcement learning (RL) is managing the exploration-exploitation trade-off without sacrificing sample efficiency. Policy gradient (PG) methods excel in exploitation through fine-grained, gradient-based optimization but often struggle with exploration due to their focus on local search. In contrast, evolutionary computation (EC) methods excel in global exploration, but lack mechanisms for exploitation. To address these limitations, this paper proposes Evolutionary Policy Optimization (EPO), a hybrid algorithm that integrates neuroevolution with policy gradient methods for policy optimization. EPO leverages the exploration capabilities of EC and the exploitation strengths of PG, offering an efficient solution to the exploration-exploitation dilemma in RL. EPO is evaluated on the Atari Pong and Breakout benchmarks. Experimental results show that EPO improves both policy quality and sample efficiency compared to standard PG and EC methods, making it effective for tasks that require both exploration and local optimization.
Related papers
- Evolutionary Policy Optimization [47.30139909878251]
Current on-policy methods fail to fully leverage the benefits of parallelized environments.<n>EPO is a novel policy gradient algorithm that combines the strengths of EA and policy gradients.<n>We show that EPO significantly improves performance across diverse and challenging environments.
arXiv Detail & Related papers (2025-03-24T18:08:54Z) - Hierarchical Preference Optimization: Learning to achieve goals via feasible subgoals prediction [71.81851971324187]
This work introduces Hierarchical Preference Optimization (HPO), a novel approach to hierarchical reinforcement learning (HRL)
HPO addresses non-stationarity and infeasible subgoal generation issues when solving complex robotic control tasks.
Experiments on challenging robotic navigation and manipulation tasks demonstrate impressive performance of HPO, where it shows an improvement of up to 35% over the baselines.
arXiv Detail & Related papers (2024-11-01T04:58:40Z) - Diffusion-based Reinforcement Learning via Q-weighted Variational Policy Optimization [55.97310586039358]
Diffusion models have garnered widespread attention in Reinforcement Learning (RL) for their powerful expressiveness and multimodality.<n>We propose a novel model-free diffusion-based online RL algorithm, Q-weighted Variational Policy Optimization (QVPO)<n>Specifically, we introduce the Q-weighted variational loss, which can be proved to be a tight lower bound of the policy objective in online RL under certain conditions.<n>We also develop an efficient behavior policy to enhance sample efficiency by reducing the variance of the diffusion policy during online interactions.
arXiv Detail & Related papers (2024-05-25T10:45:46Z) - Surpassing legacy approaches to PWR core reload optimization with single-objective Reinforcement learning [0.0]
We have developed methods based on Deep Reinforcement Learning (DRL) for both single- and multi-objective optimization.
In this paper, we demonstrate the advantage of our RL-based approach, specifically using Proximal Policy Optimization (PPO)
PPO adapts its search capability via a policy with learnable weights, allowing it to function as both a global and local search method.
arXiv Detail & Related papers (2024-02-16T19:35:58Z) - Towards Efficient Exact Optimization of Language Model Alignment [93.39181634597877]
Direct preference optimization (DPO) was proposed to directly optimize the policy from preference data.
We show that DPO derived based on the optimal solution of problem leads to a compromised mean-seeking approximation of the optimal solution in practice.
We propose efficient exact optimization (EXO) of the alignment objective.
arXiv Detail & Related papers (2024-02-01T18:51:54Z) - Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces.
We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories.
We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z) - Diverse Policy Optimization for Structured Action Space [59.361076277997704]
We propose Diverse Policy Optimization (DPO) to model the policies in structured action space as the energy-based models (EBM)
A novel and powerful generative model, GFlowNet, is introduced as the efficient, diverse EBM-based policy sampler.
Experiments on ATSC and Battle benchmarks demonstrate that DPO can efficiently discover surprisingly diverse policies.
arXiv Detail & Related papers (2023-02-23T10:48:09Z) - Towards Applicable Reinforcement Learning: Improving the Generalization
and Sample Efficiency with Policy Ensemble [43.95417785185457]
It is challenging for reinforcement learning algorithms to succeed in real-world applications like financial trading and logistic system.
We propose Ensemble Proximal Policy Optimization (EPPO), which learns ensemble policies in an end-to-end manner.
EPPO achieves higher efficiency and is robust for real-world applications compared with vanilla policy optimization algorithms and other ensemble methods.
arXiv Detail & Related papers (2022-05-19T02:25:32Z) - Evolutionary Action Selection for Gradient-based Policy Learning [6.282299638495976]
Evolutionary algorithms (EAs) and Deep Reinforcement Learning (DRL) have recently been combined to integrate the advantages of the two solutions for better policy learning.
We propose Evolutionary Action Selection-Twin Delayed Deep Deterministic Policy Gradient (EAS-TD3), a novel combination of EA and DRL.
arXiv Detail & Related papers (2022-01-12T03:31:21Z) - Proximal Policy Optimization via Enhanced Exploration Efficiency [6.2501569560329555]
Proximal policy optimization (PPO) algorithm is a deep reinforcement learning algorithm with outstanding performance.
This paper analyzes the assumption of the original Gaussian action exploration mechanism in PPO algorithm, and clarifies the influence of exploration ability on performance.
We propose intrinsic exploration module (IEM-PPO) which can be used in complex environments.
arXiv Detail & Related papers (2020-11-11T03:03:32Z) - Provably Efficient Exploration in Policy Optimization [117.09887790160406]
This paper proposes an Optimistic variant of the Proximal Policy Optimization algorithm (OPPO)
OPPO achieves $tildeO(sqrtd2 H3 T )$ regret.
To the best of our knowledge, OPPO is the first provably efficient policy optimization algorithm that explores.
arXiv Detail & Related papers (2019-12-12T08:40:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.