Safe and Robust Experience Sharing for Deterministic Policy Gradient
Algorithms
- URL: http://arxiv.org/abs/2207.13453v1
- Date: Wed, 27 Jul 2022 11:10:50 GMT
- Title: Safe and Robust Experience Sharing for Deterministic Policy Gradient
Algorithms
- Authors: Baturay Saglam, Dogan C. Cicek, Furkan B. Mutlu, Suleyman S. Kozat
- Abstract summary: We introduce a simple yet effective experience sharing mechanism for deterministic policies in continuous action domains.
We facilitate our algorithm with a novel off-policy correction technique without any action probability estimates.
We test the effectiveness of our method in challenging OpenAI Gym continuous control tasks and conclude that it can achieve a safe experience sharing across multiple agents.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning in high dimensional continuous tasks is challenging, mainly when the
experience replay memory is very limited. We introduce a simple yet effective
experience sharing mechanism for deterministic policies in continuous action
domains for the future off-policy deep reinforcement learning applications in
which the allocated memory for the experience replay buffer is limited. To
overcome the extrapolation error induced by learning from other agents'
experiences, we facilitate our algorithm with a novel off-policy correction
technique without any action probability estimates. We test the effectiveness
of our method in challenging OpenAI Gym continuous control tasks and conclude
that it can achieve a safe experience sharing across multiple agents and
exhibits a robust performance when the replay memory is strictly limited.
Related papers
- State-Novelty Guided Action Persistence in Deep Reinforcement Learning [7.05832012052375]
We propose a novel method to dynamically adjust the action persistence based on the current exploration status of the state space.
Our method can be seamlessly integrated into various basic exploration strategies to incorporate temporal persistence.
arXiv Detail & Related papers (2024-09-09T08:34:22Z) - Learning Uncertainty-Aware Temporally-Extended Actions [22.901453123868674]
We propose a novel algorithm named Uncertainty-aware Temporal Extension (UTE)
UTE employs ensemble methods to accurately measure uncertainty during action extension.
We demonstrate the effectiveness of UTE through experiments in Gridworld and Atari 2600 environments.
arXiv Detail & Related papers (2024-02-08T06:32:06Z) - Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization [51.34904967046097]
Continual learning seeks to overcome the challenge of catastrophic forgetting, where a model forgets previously learnt information.
We introduce a novel prior-based method that better constrains parameter growth, reducing catastrophic forgetting.
Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments.
arXiv Detail & Related papers (2023-09-15T17:10:51Z) - AdaER: An Adaptive Experience Replay Approach for Continual Lifelong
Learning [16.457330925212606]
We present adaptive-experience replay (AdaER) to address the challenge of continual lifelong learning.
AdaER consists of two stages: memory replay and memory update.
Results: AdaER outperforms existing continual lifelong learning baselines.
arXiv Detail & Related papers (2023-08-07T01:25:45Z) - TempoRL: Temporal Priors for Exploration in Off-Policy Reinforcement
Learning [33.512849582347734]
We propose to learn features from offline data that are shared by a more diverse range of tasks.
We introduce state-independent temporal priors, which directly model temporal consistency in demonstrated trajectories.
We also introduce a novel integration scheme for action priors in off-policy reinforcement learning.
arXiv Detail & Related papers (2022-05-26T17:49:12Z) - Relational Experience Replay: Continual Learning by Adaptively Tuning
Task-wise Relationship [54.73817402934303]
We propose Experience Continual Replay (ERR), a bi-level learning framework to adaptively tune task-wise to achieve a better stability plasticity' tradeoff.
ERR can consistently improve the performance of all baselines and surpass current state-of-the-art methods.
arXiv Detail & Related papers (2021-12-31T12:05:22Z) - Revisiting Fundamentals of Experience Replay [91.24213515992595]
We present a systematic and extensive analysis of experience replay in Q-learning methods.
We focus on two fundamental properties: the replay capacity and the ratio of learning updates to experience collected.
arXiv Detail & Related papers (2020-07-13T21:22:17Z) - Experience Replay with Likelihood-free Importance Weights [123.52005591531194]
We propose to reweight experiences based on their likelihood under the stationary distribution of the current policy.
We apply the proposed approach empirically on two competitive methods, Soft Actor Critic (SAC) and Twin Delayed Deep Deterministic policy gradient (TD3)
arXiv Detail & Related papers (2020-06-23T17:17:44Z) - Soft Hindsight Experience Replay [77.99182201815763]
Soft Hindsight Experience Replay (SHER) is a novel approach based on HER and Maximum Entropy Reinforcement Learning (MERL)
We evaluate SHER on Open AI Robotic manipulation tasks with sparse rewards.
arXiv Detail & Related papers (2020-02-06T03:57:04Z) - Reward-Conditioned Policies [100.64167842905069]
imitation learning requires near-optimal expert data.
Can we learn effective policies via supervised learning without demonstrations?
We show how such an approach can be derived as a principled method for policy search.
arXiv Detail & Related papers (2019-12-31T18:07:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.