An Efficient Asynchronous Method for Integrating Evolutionary and
Gradient-based Policy Search
- URL: http://arxiv.org/abs/2012.05417v2
- Date: Wed, 6 Jan 2021 05:12:47 GMT
- Title: An Efficient Asynchronous Method for Integrating Evolutionary and
Gradient-based Policy Search
- Authors: Kyunghyun Lee, Byeong-Uk Lee, Ukcheol Shin and In So Kweon
- Abstract summary: We introduce an Asynchronous Evolution Strategy-Reinforcement Learning (AES-RL) that maximizes the parallel efficiency of ES and integrates it with policy gradient methods.
Specifically, we propose 1) a novel framework to merge ES and DRL asynchronously and 2) various asynchronous update methods that can take all advantages of asynchronism, ES, and DRL.
- Score: 76.73477450555046
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep reinforcement learning (DRL) algorithms and evolution strategies (ES)
have been applied to various tasks, showing excellent performances. These have
the opposite properties, with DRL having good sample efficiency and poor
stability, while ES being vice versa. Recently, there have been attempts to
combine these algorithms, but these methods fully rely on synchronous update
scheme, making it not ideal to maximize the benefits of the parallelism in ES.
To solve this challenge, asynchronous update scheme was introduced, which is
capable of good time-efficiency and diverse policy exploration. In this paper,
we introduce an Asynchronous Evolution Strategy-Reinforcement Learning (AES-RL)
that maximizes the parallel efficiency of ES and integrates it with policy
gradient methods. Specifically, we propose 1) a novel framework to merge ES and
DRL asynchronously and 2) various asynchronous update methods that can take all
advantages of asynchronism, ES, and DRL, which are exploration and time
efficiency, stability, and sample efficiency, respectively. The proposed
framework and update methods are evaluated in continuous control benchmark
work, showing superior performance as well as time efficiency compared to the
previous methods.
Related papers
- Enhancing Spectrum Efficiency in 6G Satellite Networks: A GAIL-Powered Policy Learning via Asynchronous Federated Inverse Reinforcement Learning [67.95280175998792]
A novel adversarial imitation learning (GAIL)-powered policy learning approach is proposed for optimizing beamforming, spectrum allocation, and remote user equipment (RUE) association ins.
We employ inverse RL (IRL) to automatically learn reward functions without manual tuning.
We show that the proposed MA-AL method outperforms traditional RL approaches, achieving a $14.6%$ improvement in convergence and reward value.
arXiv Detail & Related papers (2024-09-27T13:05:02Z) - FADAS: Towards Federated Adaptive Asynchronous Optimization [56.09666452175333]
Federated learning (FL) has emerged as a widely adopted training paradigm for privacy-preserving machine learning.
This paper introduces federated adaptive asynchronous optimization, named FADAS, a novel method that incorporates asynchronous updates into adaptive federated optimization with provable guarantees.
We rigorously establish the convergence rate of the proposed algorithms and empirical results demonstrate the superior performance of FADAS over other asynchronous FL baselines.
arXiv Detail & Related papers (2024-07-25T20:02:57Z) - AsGrad: A Sharp Unified Analysis of Asynchronous-SGD Algorithms [45.90015262911875]
We analyze asynchronous-type algorithms for distributed SGD in the heterogeneous setting.
As a by-product of our analysis, we also demonstrate guarantees for gradient-type algorithms such as SGD with random tightness.
arXiv Detail & Related papers (2023-10-31T13:44:53Z) - Robust Fully-Asynchronous Methods for Distributed Training over General Architecture [11.480605289411807]
Perfect synchronization in distributed machine learning problems is inefficient and even impossible due to the existence of latency, package losses and stragglers.
We propose Fully-Asynchronous Gradient Tracking method (R-FAST), where each device performs local computation and communication at its own without any form of impact.
arXiv Detail & Related papers (2023-07-21T14:36:40Z) - Straggler-Resilient Decentralized Learning via Adaptive Asynchronous Updates [28.813671194939225]
fully decentralized optimization methods have been advocated as alternatives to the popular parameter server framework.
We propose a fully decentralized algorithm with adaptive asynchronous updates via adaptively determining the number of neighbor workers for each worker to communicate with.
We show that DSGD-AAU achieves a linear speedup for convergence and demonstrate its effectiveness via extensive experiments.
arXiv Detail & Related papers (2023-06-11T02:08:59Z) - Progressive extension of reinforcement learning action dimension for
asymmetric assembly tasks [7.4642148614421995]
In this paper, a progressive extension of action dimension (PEAD) mechanism is proposed to optimize the convergence of RL algorithms.
The results demonstrate the PEAD method will enhance the data-efficiency and time-efficiency of RL algorithms as well as increase the stable reward.
arXiv Detail & Related papers (2021-04-06T11:48:54Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - High-Throughput Synchronous Deep RL [132.43861715707905]
We propose High-Throughput Synchronous Deep Reinforcement Learning (HTS-RL)
We perform learning and rollouts concurrently, devise a system design which avoids stale policies'
We evaluate our approach on Atari games and the Google Research Football environment.
arXiv Detail & Related papers (2020-12-17T18:59:01Z) - EOS: a Parallel, Self-Adaptive, Multi-Population Evolutionary Algorithm
for Constrained Global Optimization [68.8204255655161]
EOS is a global optimization algorithm for constrained and unconstrained problems of real-valued variables.
It implements a number of improvements to the well-known Differential Evolution (DE) algorithm.
Results prove that EOSis capable of achieving increased performance compared to state-of-the-art single-population self-adaptive DE algorithms.
arXiv Detail & Related papers (2020-07-09T10:19:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.