Related papers: Accelerating Reinforcement Learning with a Directional-Gaussian-Smoothing Evolution Strategy

Accelerating Reinforcement Learning with a Directional-Gaussian-Smoothing Evolution Strategy

URL: http://arxiv.org/abs/2002.09077v1
Date: Fri, 21 Feb 2020 01:05:57 GMT
Title: Accelerating Reinforcement Learning with a Directional-Gaussian-Smoothing Evolution Strategy
Authors: Jiaxing Zhang, Hoang Tran, Guannan Zhang
Abstract summary: Evolution strategy (ES) has been shown great promise in many challenging reinforcement learning (RL) tasks. There are two limitations in the current ES practice that may hinder its otherwise further capabilities. In this work, we employ a Directional Gaussian Smoothing Evolutionary Strategy (DGS-ES) to accelerate RL training. We show that DGS-ES is highly scalable, possesses superior wall-clock time, and achieves competitive reward scores to other popular policy gradient and ES approaches.
Score: 3.404507240556492
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Evolution strategy (ES) has been shown great promise in many challenging reinforcement learning (RL) tasks, rivaling other state-of-the-art deep RL methods. Yet, there are two limitations in the current ES practice that may hinder its otherwise further capabilities. First, most current methods rely on Monte Carlo type gradient estimators to suggest search direction, where the policy parameter is, in general, randomly sampled. Due to the low accuracy of such estimators, the RL training may suffer from slow convergence and require more iterations to reach optimal solution. Secondly, the landscape of reward functions can be deceptive and contains many local maxima, causing ES algorithms to prematurely converge and be unable to explore other parts of the parameter space with potentially greater rewards. In this work, we employ a Directional Gaussian Smoothing Evolutionary Strategy (DGS-ES) to accelerate RL training, which is well-suited to address these two challenges with its ability to i) provide gradient estimates with high accuracy, and ii) find nonlocal search direction which lays stress on large-scale variation of the reward function and disregards local fluctuation. Through several benchmark RL tasks demonstrated herein, we show that DGS-ES is highly scalable, possesses superior wall-clock time, and achieves competitive reward scores to other popular policy gradient and ES approaches.

Related papers

RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning [125.65034908728828]
Training large language models (LLMs) as interactive agents presents unique challenges. While reinforcement learning has enabled progress in static tasks, multi-turn agent RL training remains underexplored. We propose StarPO, a general framework for trajectory-level agent RL, and introduce RAGEN, a modular system for training and evaluating LLM agents.
arXiv Detail & Related papers (2025-04-24T17:57:08Z)
On the Convergence of (Stochastic) Gradient Descent for Kolmogorov--Arnold Networks [56.78271181959529]
Kolmogorov--Arnold Networks (KANs) have gained significant attention in the deep learning community. Empirical investigations demonstrate that KANs optimized via gradient descent (SGD) are capable of achieving near-zero training loss.
arXiv Detail & Related papers (2024-10-10T15:34:10Z)
Adaptive trajectory-constrained exploration strategy for deep reinforcement learning [6.589742080994319]
Deep reinforcement learning (DRL) faces significant challenges in addressing the hard-exploration problems in tasks with sparse or deceptive rewards and large state spaces. We propose an efficient adaptive trajectory-constrained exploration strategy for DRL. We conduct experiments on two large 2D grid world mazes and several MuJoCo tasks.
arXiv Detail & Related papers (2023-12-27T07:57:15Z)
Reinforcement Learning from Diverse Human Preferences [68.4294547285359]
This paper develops a method for crowd-sourcing preference labels and learning from diverse human preferences. The proposed method is tested on a variety of tasks in DMcontrol and Meta-world. It has shown consistent and significant improvements over existing preference-based RL algorithms when learning from diverse feedback.
arXiv Detail & Related papers (2023-01-27T15:18:54Z)
Train Hard, Fight Easy: Robust Meta Reinforcement Learning [78.16589993684698]
A major challenge of reinforcement learning (RL) in real-world applications is the variation between environments, tasks or clients. Standard MRL methods optimize the average return over tasks, but often suffer from poor results in tasks of high risk or difficulty. In this work, we define a robust MRL objective with a controlled level. The data inefficiency is addressed via the novel Robust Meta RL algorithm (RoML)
arXiv Detail & Related papers (2023-01-26T14:54:39Z)
Deep Black-Box Reinforcement Learning with Movement Primitives [15.184283143878488]
We present a new algorithm for deep reinforcement learning (RL) It is based on differentiable trust region layers, a successful on-policy deep RL algorithm. We compare our ERL algorithm to state-of-the-art step-based algorithms in many complex simulated robotic control tasks.
arXiv Detail & Related papers (2022-10-18T06:34:52Z)
Learning Sampling Policy for Faster Derivative Free Optimization [100.27518340593284]
We propose a new reinforcement learning based ZO algorithm (ZO-RL) with learning the sampling policy for generating the perturbations in ZO optimization instead of using random sampling. Our results show that our ZO-RL algorithm can effectively reduce the variances of ZO gradient by learning a sampling policy, and converge faster than existing ZO algorithms in different scenarios.
arXiv Detail & Related papers (2021-04-09T14:50:59Z)
Soft policy optimization using dual-track advantage estimator [5.4020749513539235]
This paper introduces the entropy and dynamically setting the temperature coefficient to balance the opportunity of exploration and exploitation. We propose the dual-track advantage estimator (DTAE) to accelerate the convergence of value functions and further enhance the performance of the algorithm. Compared with other on-policy RL algorithms on the Mujoco environment, the proposed method achieves the most advanced results in cumulative return.
arXiv Detail & Related papers (2020-09-15T04:09:29Z)
SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms. SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z)
Robust Reinforcement Learning via Adversarial training with Langevin Dynamics [51.234482917047835]
We introduce a sampling perspective to tackle the challenging task of training robust Reinforcement Learning (RL) agents. We present a novel, scalable two-player RL algorithm, which is a sampling variant of the two-player policy method.
arXiv Detail & Related papers (2020-02-14T14:59:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.