Related papers: An Efficient Asynchronous Method for Integrating Evolutionary and Gradient-based Policy Search

An Efficient Asynchronous Method for Integrating Evolutionary and Gradient-based Policy Search

URL: http://arxiv.org/abs/2012.05417v2
Date: Wed, 6 Jan 2021 05:12:47 GMT
Title: An Efficient Asynchronous Method for Integrating Evolutionary and Gradient-based Policy Search
Authors: Kyunghyun Lee, Byeong-Uk Lee, Ukcheol Shin and In So Kweon
Abstract summary: We introduce an Asynchronous Evolution Strategy-Reinforcement Learning (AES-RL) that maximizes the parallel efficiency of ES and integrates it with policy gradient methods. Specifically, we propose 1) a novel framework to merge ES and DRL asynchronously and 2) various asynchronous update methods that can take all advantages of asynchronism, ES, and DRL.
Score: 76.73477450555046
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep reinforcement learning (DRL) algorithms and evolution strategies (ES) have been applied to various tasks, showing excellent performances. These have the opposite properties, with DRL having good sample efficiency and poor stability, while ES being vice versa. Recently, there have been attempts to combine these algorithms, but these methods fully rely on synchronous update scheme, making it not ideal to maximize the benefits of the parallelism in ES. To solve this challenge, asynchronous update scheme was introduced, which is capable of good time-efficiency and diverse policy exploration. In this paper, we introduce an Asynchronous Evolution Strategy-Reinforcement Learning (AES-RL) that maximizes the parallel efficiency of ES and integrates it with policy gradient methods. Specifically, we propose 1) a novel framework to merge ES and DRL asynchronously and 2) various asynchronous update methods that can take all advantages of asynchronism, ES, and DRL, which are exploration and time efficiency, stability, and sample efficiency, respectively. The proposed framework and update methods are evaluated in continuous control benchmark work, showing superior performance as well as time efficiency compared to the previous methods.

Related papers

Nesterov Method for Asynchronous Pipeline Parallel Optimization [59.79227116582264]
We introduce a variant of Nesterov Accelerated Gradient (NAG) for asynchronous optimization in Pipeline Parallelism.<n>Specifically, we modify the look-ahead step in NAG to effectively address the staleness in gradients.<n>We theoretically prove that our approach converges at a sublinear rate in the presence of fixed delay in gradients.
arXiv Detail & Related papers (2025-05-02T08:23:29Z)
AlphaAdam:Asynchronous Masked Optimization with Dynamic Alpha for Selective Updates [17.490809667438818]
We propose AlphaAdam, an optimization framework for large language models (LLMs) By decoupling parameter updates and dynamically adjusting their strength, AlphaAdam accelerates convergence and improves training stability.
arXiv Detail & Related papers (2025-01-30T02:10:23Z)
Enhancing Spectrum Efficiency in 6G Satellite Networks: A GAIL-Powered Policy Learning via Asynchronous Federated Inverse Reinforcement Learning [67.95280175998792]
A novel adversarial imitation learning (GAIL)-powered policy learning approach is proposed for optimizing beamforming, spectrum allocation, and remote user equipment (RUE) association ins. We employ inverse RL (IRL) to automatically learn reward functions without manual tuning. We show that the proposed MA-AL method outperforms traditional RL approaches, achieving a $14.6%$ improvement in convergence and reward value.
arXiv Detail & Related papers (2024-09-27T13:05:02Z)
FADAS: Towards Federated Adaptive Asynchronous Optimization [56.09666452175333]
Federated learning (FL) has emerged as a widely adopted training paradigm for privacy-preserving machine learning. This paper introduces federated adaptive asynchronous optimization, named FADAS, a novel method that incorporates asynchronous updates into adaptive federated optimization with provable guarantees. We rigorously establish the convergence rate of the proposed algorithms and empirical results demonstrate the superior performance of FADAS over other asynchronous FL baselines.
arXiv Detail & Related papers (2024-07-25T20:02:57Z)
AsGrad: A Sharp Unified Analysis of Asynchronous-SGD Algorithms [45.90015262911875]
We analyze asynchronous-type algorithms for distributed SGD in the heterogeneous setting. As a by-product of our analysis, we also demonstrate guarantees for gradient-type algorithms such as SGD with random tightness.
arXiv Detail & Related papers (2023-10-31T13:44:53Z)
Robust Fully-Asynchronous Methods for Distributed Training over General Architecture [11.480605289411807]
Perfect synchronization in distributed machine learning problems is inefficient and even impossible due to the existence of latency, package losses and stragglers. We propose Fully-Asynchronous Gradient Tracking method (R-FAST), where each device performs local computation and communication at its own without any form of impact.
arXiv Detail & Related papers (2023-07-21T14:36:40Z)
Straggler-Resilient Decentralized Learning via Adaptive Asynchronous Updates [28.813671194939225]
fully decentralized optimization methods have been advocated as alternatives to the popular parameter server framework. We propose a fully decentralized algorithm with adaptive asynchronous updates via adaptively determining the number of neighbor workers for each worker to communicate with. We show that DSGD-AAU achieves a linear speedup for convergence and demonstrate its effectiveness via extensive experiments.
arXiv Detail & Related papers (2023-06-11T02:08:59Z)
Progressive extension of reinforcement learning action dimension for asymmetric assembly tasks [7.4642148614421995]
In this paper, a progressive extension of action dimension (PEAD) mechanism is proposed to optimize the convergence of RL algorithms. The results demonstrate the PEAD method will enhance the data-efficiency and time-efficiency of RL algorithms as well as increase the stable reward.
arXiv Detail & Related papers (2021-04-06T11:48:54Z)
DEALIO: Data-Efficient Adversarial Learning for Imitation from Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator. Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms. This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk. We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z)
High-Throughput Synchronous Deep RL [132.43861715707905]
We propose High-Throughput Synchronous Deep Reinforcement Learning (HTS-RL) We perform learning and rollouts concurrently, devise a system design which avoids stale policies' We evaluate our approach on Atari games and the Google Research Football environment.
arXiv Detail & Related papers (2020-12-17T18:59:01Z)
EOS: a Parallel, Self-Adaptive, Multi-Population Evolutionary Algorithm for Constrained Global Optimization [68.8204255655161]
EOS is a global optimization algorithm for constrained and unconstrained problems of real-valued variables. It implements a number of improvements to the well-known Differential Evolution (DE) algorithm. Results prove that EOSis capable of achieving increased performance compared to state-of-the-art single-population self-adaptive DE algorithms.
arXiv Detail & Related papers (2020-07-09T10:19:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.