Related papers: Safe and Robust Experience Sharing for Deterministic Policy Gradient Algorithms

Safe and Robust Experience Sharing for Deterministic Policy Gradient Algorithms

URL: http://arxiv.org/abs/2207.13453v1
Date: Wed, 27 Jul 2022 11:10:50 GMT
Title: Safe and Robust Experience Sharing for Deterministic Policy Gradient Algorithms
Authors: Baturay Saglam, Dogan C. Cicek, Furkan B. Mutlu, Suleyman S. Kozat
Abstract summary: We introduce a simple yet effective experience sharing mechanism for deterministic policies in continuous action domains. We facilitate our algorithm with a novel off-policy correction technique without any action probability estimates. We test the effectiveness of our method in challenging OpenAI Gym continuous control tasks and conclude that it can achieve a safe experience sharing across multiple agents.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Learning in high dimensional continuous tasks is challenging, mainly when the experience replay memory is very limited. We introduce a simple yet effective experience sharing mechanism for deterministic policies in continuous action domains for the future off-policy deep reinforcement learning applications in which the allocated memory for the experience replay buffer is limited. To overcome the extrapolation error induced by learning from other agents' experiences, we facilitate our algorithm with a novel off-policy correction technique without any action probability estimates. We test the effectiveness of our method in challenging OpenAI Gym continuous control tasks and conclude that it can achieve a safe experience sharing across multiple agents and exhibits a robust performance when the replay memory is strictly limited.

Related papers

Dynamic Mixture of Progressive Parameter-Efficient Expert Library for Lifelong Robot Learning [69.81148368677593]
A generalist agent must continuously learn and adapt throughout its lifetime, achieving efficient forward transfer while minimizing catastrophic forgetting.<n>Previous work has explored parameter-efficient fine-tuning for single-task adaptation, effectively steering a frozen pretrained model with a small number of parameters.<n>We propose Dynamic Mixture of Progressive Efficient Expert Library (DMPEL) for lifelong robot learning.<n>Our framework outperforms state-of-the-art lifelong learning methods in success rates across continual adaptation, while utilizing minimal trainable parameters and storage.
arXiv Detail & Related papers (2025-06-06T11:13:04Z)
DIAL: Distribution-Informed Adaptive Learning of Multi-Task Constraints for Safety-Critical Systems [13.93024489228903]
predefined constraint functions to ensure safety in complex real-world tasks, such as autonomous driving. Recent research highlights the potential of leveraging pre-acquired task-agnostic knowledge to enhance both safety and sample efficiency in related tasks. We propose a novel method to learn shared constraint distributions across multiple tasks. Our approach identifies the shared constraints through imitation learning and then adapts to new tasks by adjusting risk levels within these learned distributions.
arXiv Detail & Related papers (2025-01-30T01:56:07Z)
State-Novelty Guided Action Persistence in Deep Reinforcement Learning [7.05832012052375]
We propose a novel method to dynamically adjust the action persistence based on the current exploration status of the state space. Our method can be seamlessly integrated into various basic exploration strategies to incorporate temporal persistence.
arXiv Detail & Related papers (2024-09-09T08:34:22Z)
Learning Uncertainty-Aware Temporally-Extended Actions [22.901453123868674]
We propose a novel algorithm named Uncertainty-aware Temporal Extension (UTE) UTE employs ensemble methods to accurately measure uncertainty during action extension. We demonstrate the effectiveness of UTE through experiments in Gridworld and Atari 2600 environments.
arXiv Detail & Related papers (2024-02-08T06:32:06Z)
Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization [51.34904967046097]
Continual learning seeks to overcome the challenge of catastrophic forgetting, where a model forgets previously learnt information. We introduce a novel prior-based method that better constrains parameter growth, reducing catastrophic forgetting. Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments.
arXiv Detail & Related papers (2023-09-15T17:10:51Z)
AdaER: An Adaptive Experience Replay Approach for Continual Lifelong Learning [16.457330925212606]
We present adaptive-experience replay (AdaER) to address the challenge of continual lifelong learning. AdaER consists of two stages: memory replay and memory update. Results: AdaER outperforms existing continual lifelong learning baselines.
arXiv Detail & Related papers (2023-08-07T01:25:45Z)
TempoRL: Temporal Priors for Exploration in Off-Policy Reinforcement Learning [33.512849582347734]
We propose to learn features from offline data that are shared by a more diverse range of tasks. We introduce state-independent temporal priors, which directly model temporal consistency in demonstrated trajectories. We also introduce a novel integration scheme for action priors in off-policy reinforcement learning.
arXiv Detail & Related papers (2022-05-26T17:49:12Z)
Relational Experience Replay: Continual Learning by Adaptively Tuning Task-wise Relationship [54.73817402934303]
We propose Experience Continual Replay (ERR), a bi-level learning framework to adaptively tune task-wise to achieve a better stability plasticity' tradeoff. ERR can consistently improve the performance of all baselines and surpass current state-of-the-art methods.
arXiv Detail & Related papers (2021-12-31T12:05:22Z)
Revisiting Fundamentals of Experience Replay [91.24213515992595]
We present a systematic and extensive analysis of experience replay in Q-learning methods. We focus on two fundamental properties: the replay capacity and the ratio of learning updates to experience collected.
arXiv Detail & Related papers (2020-07-13T21:22:17Z)
Experience Replay with Likelihood-free Importance Weights [123.52005591531194]
We propose to reweight experiences based on their likelihood under the stationary distribution of the current policy. We apply the proposed approach empirically on two competitive methods, Soft Actor Critic (SAC) and Twin Delayed Deep Deterministic policy gradient (TD3)
arXiv Detail & Related papers (2020-06-23T17:17:44Z)
Soft Hindsight Experience Replay [77.99182201815763]
Soft Hindsight Experience Replay (SHER) is a novel approach based on HER and Maximum Entropy Reinforcement Learning (MERL) We evaluate SHER on Open AI Robotic manipulation tasks with sparse rewards.
arXiv Detail & Related papers (2020-02-06T03:57:04Z)
Reward-Conditioned Policies [100.64167842905069]
imitation learning requires near-optimal expert data. Can we learn effective policies via supervised learning without demonstrations? We show how such an approach can be derived as a principled method for policy search.
arXiv Detail & Related papers (2019-12-31T18:07:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.