Accelerating Reinforcement Learning with a
Directional-Gaussian-Smoothing Evolution Strategy
- URL: http://arxiv.org/abs/2002.09077v1
- Date: Fri, 21 Feb 2020 01:05:57 GMT
- Title: Accelerating Reinforcement Learning with a
Directional-Gaussian-Smoothing Evolution Strategy
- Authors: Jiaxing Zhang, Hoang Tran, Guannan Zhang
- Abstract summary: Evolution strategy (ES) has been shown great promise in many challenging reinforcement learning (RL) tasks.
There are two limitations in the current ES practice that may hinder its otherwise further capabilities.
In this work, we employ a Directional Gaussian Smoothing Evolutionary Strategy (DGS-ES) to accelerate RL training.
We show that DGS-ES is highly scalable, possesses superior wall-clock time, and achieves competitive reward scores to other popular policy gradient and ES approaches.
- Score: 3.404507240556492
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Evolution strategy (ES) has been shown great promise in many challenging
reinforcement learning (RL) tasks, rivaling other state-of-the-art deep RL
methods. Yet, there are two limitations in the current ES practice that may
hinder its otherwise further capabilities. First, most current methods rely on
Monte Carlo type gradient estimators to suggest search direction, where the
policy parameter is, in general, randomly sampled. Due to the low accuracy of
such estimators, the RL training may suffer from slow convergence and require
more iterations to reach optimal solution. Secondly, the landscape of reward
functions can be deceptive and contains many local maxima, causing ES
algorithms to prematurely converge and be unable to explore other parts of the
parameter space with potentially greater rewards. In this work, we employ a
Directional Gaussian Smoothing Evolutionary Strategy (DGS-ES) to accelerate RL
training, which is well-suited to address these two challenges with its ability
to i) provide gradient estimates with high accuracy, and ii) find nonlocal
search direction which lays stress on large-scale variation of the reward
function and disregards local fluctuation. Through several benchmark RL tasks
demonstrated herein, we show that DGS-ES is highly scalable, possesses superior
wall-clock time, and achieves competitive reward scores to other popular policy
gradient and ES approaches.
Related papers
- On the Convergence of (Stochastic) Gradient Descent for Kolmogorov--Arnold Networks [56.78271181959529]
Kolmogorov--Arnold Networks (KANs) have gained significant attention in the deep learning community.
Empirical investigations demonstrate that KANs optimized via gradient descent (SGD) are capable of achieving near-zero training loss.
arXiv Detail & Related papers (2024-10-10T15:34:10Z) - Adaptive trajectory-constrained exploration strategy for deep
reinforcement learning [6.589742080994319]
Deep reinforcement learning (DRL) faces significant challenges in addressing the hard-exploration problems in tasks with sparse or deceptive rewards and large state spaces.
We propose an efficient adaptive trajectory-constrained exploration strategy for DRL.
We conduct experiments on two large 2D grid world mazes and several MuJoCo tasks.
arXiv Detail & Related papers (2023-12-27T07:57:15Z) - Reinforcement Learning from Diverse Human Preferences [68.4294547285359]
This paper develops a method for crowd-sourcing preference labels and learning from diverse human preferences.
The proposed method is tested on a variety of tasks in DMcontrol and Meta-world.
It has shown consistent and significant improvements over existing preference-based RL algorithms when learning from diverse feedback.
arXiv Detail & Related papers (2023-01-27T15:18:54Z) - Train Hard, Fight Easy: Robust Meta Reinforcement Learning [78.16589993684698]
A major challenge of reinforcement learning (RL) in real-world applications is the variation between environments, tasks or clients.
Standard MRL methods optimize the average return over tasks, but often suffer from poor results in tasks of high risk or difficulty.
In this work, we define a robust MRL objective with a controlled level.
The data inefficiency is addressed via the novel Robust Meta RL algorithm (RoML)
arXiv Detail & Related papers (2023-01-26T14:54:39Z) - Deep Black-Box Reinforcement Learning with Movement Primitives [15.184283143878488]
We present a new algorithm for deep reinforcement learning (RL)
It is based on differentiable trust region layers, a successful on-policy deep RL algorithm.
We compare our ERL algorithm to state-of-the-art step-based algorithms in many complex simulated robotic control tasks.
arXiv Detail & Related papers (2022-10-18T06:34:52Z) - Learning Sampling Policy for Faster Derivative Free Optimization [100.27518340593284]
We propose a new reinforcement learning based ZO algorithm (ZO-RL) with learning the sampling policy for generating the perturbations in ZO optimization instead of using random sampling.
Our results show that our ZO-RL algorithm can effectively reduce the variances of ZO gradient by learning a sampling policy, and converge faster than existing ZO algorithms in different scenarios.
arXiv Detail & Related papers (2021-04-09T14:50:59Z) - Soft policy optimization using dual-track advantage estimator [5.4020749513539235]
This paper introduces the entropy and dynamically setting the temperature coefficient to balance the opportunity of exploration and exploitation.
We propose the dual-track advantage estimator (DTAE) to accelerate the convergence of value functions and further enhance the performance of the algorithm.
Compared with other on-policy RL algorithms on the Mujoco environment, the proposed method achieves the most advanced results in cumulative return.
arXiv Detail & Related papers (2020-09-15T04:09:29Z) - SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep
Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms.
SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z) - Robust Reinforcement Learning via Adversarial training with Langevin
Dynamics [51.234482917047835]
We introduce a sampling perspective to tackle the challenging task of training robust Reinforcement Learning (RL) agents.
We present a novel, scalable two-player RL algorithm, which is a sampling variant of the two-player policy method.
arXiv Detail & Related papers (2020-02-14T14:59:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.