Evolutionary Action Selection for Gradient-based Policy Learning
- URL: http://arxiv.org/abs/2201.04286v1
- Date: Wed, 12 Jan 2022 03:31:21 GMT
- Title: Evolutionary Action Selection for Gradient-based Policy Learning
- Authors: Yan Ma, Tianxing Liu, Bingsheng Wei, Yi Liu, Kang Xu, Wei Li
- Abstract summary: Evolutionary algorithms (EAs) and Deep Reinforcement Learning (DRL) have recently been combined to integrate the advantages of the two solutions for better policy learning.
We propose Evolutionary Action Selection-Twin Delayed Deep Deterministic Policy Gradient (EAS-TD3), a novel combination of EA and DRL.
- Score: 6.282299638495976
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Evolutionary Algorithms (EAs) and Deep Reinforcement Learning (DRL) have
recently been combined to integrate the advantages of the two solutions for
better policy learning. However, in existing hybrid methods, EA is used to
directly train the policy network, which will lead to sample inefficiency and
unpredictable impact on the policy performance. To better integrate these two
approaches and avoid the drawbacks caused by the introduction of EA, we devote
ourselves to devising a more efficient and reasonable method of combining EA
and DRL. In this paper, we propose Evolutionary Action Selection-Twin Delayed
Deep Deterministic Policy Gradient (EAS-TD3), a novel combination of EA and
DRL. In EAS, we focus on optimizing the action chosen by the policy network and
attempt to obtain high-quality actions to guide policy learning through an
evolutionary algorithm. We conduct several experiments on challenging
continuous control tasks. The result shows that EAS-TD3 shows superior
performance over other state-of-art methods.
Related papers
- Statistically Efficient Variance Reduction with Double Policy Estimation
for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning [53.97273491846883]
We propose DPE: an RL algorithm that blends offline sequence modeling and offline reinforcement learning with Double Policy Estimation.
We validate our method in multiple tasks of OpenAI Gym with D4RL benchmarks.
arXiv Detail & Related papers (2023-08-28T20:46:07Z) - Theoretically Guaranteed Policy Improvement Distilled from Model-Based
Planning [64.10794426777493]
Model-based reinforcement learning (RL) has demonstrated remarkable successes on a range of continuous control tasks.
Recent practices tend to distill optimized action sequences into an RL policy during the training phase.
We develop an approach to distill from model-based planning to the policy.
arXiv Detail & Related papers (2023-07-24T16:52:31Z) - Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces.
We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories.
We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z) - Acceleration in Policy Optimization [50.323182853069184]
We work towards a unifying paradigm for accelerating policy optimization methods in reinforcement learning (RL) by integrating foresight in the policy improvement step via optimistic and adaptive updates.
We define optimism as predictive modelling of the future behavior of a policy, and adaptivity as taking immediate and anticipatory corrective actions to mitigate errors from overshooting predictions or delayed responses to change.
We design an optimistic policy gradient algorithm, adaptive via meta-gradient learning, and empirically highlight several design choices pertaining to acceleration, in an illustrative task.
arXiv Detail & Related papers (2023-06-18T15:50:57Z) - Diverse Policy Optimization for Structured Action Space [59.361076277997704]
We propose Diverse Policy Optimization (DPO) to model the policies in structured action space as the energy-based models (EBM)
A novel and powerful generative model, GFlowNet, is introduced as the efficient, diverse EBM-based policy sampler.
Experiments on ATSC and Battle benchmarks demonstrate that DPO can efficiently discover surprisingly diverse policies.
arXiv Detail & Related papers (2023-02-23T10:48:09Z) - ERL-Re$^2$: Efficient Evolutionary Reinforcement Learning with Shared
State Representation and Individual Policy Representation [31.9768280877473]
We propose Evolutionary Reinforcement Learning with Two-scale State Representation and Policy Representation (ERL-Re$2$)
All EA and RL policies share the same nonlinear state representation while maintaining individual linear policy representations.
Experiments on a range of continuous control tasks show that ERL-Re$2$ consistently outperforms advanced baselines and achieves the State Of The Art (SOTA)
arXiv Detail & Related papers (2022-10-26T10:34:48Z) - Enforcing the consensus between Trajectory Optimization and Policy
Learning for precise robot control [75.28441662678394]
Reinforcement learning (RL) and trajectory optimization (TO) present strong complementary advantages.
We propose several improvements on top of these approaches to learn global control policies quicker.
arXiv Detail & Related papers (2022-09-19T13:32:09Z) - Probabilistic Mixture-of-Experts for Efficient Deep Reinforcement
Learning [7.020079427649125]
We show that grasping distinguishable skills for some tasks with non-unique optima can be essential for further improving its learning efficiency and performance.
We propose a probabilistic mixture-of-experts (PMOE) for multimodal policy, together with a novel gradient estimator for the indifferentiability problem.
arXiv Detail & Related papers (2021-04-19T08:21:56Z) - SOAC: The Soft Option Actor-Critic Architecture [25.198302636265286]
Methods have been proposed for concurrently learning low-level intra-option policies and high-level option selection policy.
Existing methods typically suffer from two major challenges: ineffective exploration and unstable updates.
We present a novel and stable off-policy approach that builds on the maximum entropy model to address these challenges.
arXiv Detail & Related papers (2020-06-25T13:06:59Z) - Adaptive strategy in differential evolution via explicit exploitation
and exploration controls [0.0]
This paper proposes a new strategy adaptation method, named explicit adaptation scheme (Ea scheme)
Ea scheme separates multiple strategies and employs them on-demand.
Experimental studies on benchmark functions demonstrate the effectiveness of Ea scheme.
arXiv Detail & Related papers (2020-02-03T09:12:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.