TempoRL: Temporal Priors for Exploration in Off-Policy Reinforcement
Learning
- URL: http://arxiv.org/abs/2205.13528v1
- Date: Thu, 26 May 2022 17:49:12 GMT
- Title: TempoRL: Temporal Priors for Exploration in Off-Policy Reinforcement
Learning
- Authors: Marco Bagatella, Sammy Christen and Otmar Hilliges
- Abstract summary: We propose to learn features from offline data that are shared by a more diverse range of tasks.
We introduce state-independent temporal priors, which directly model temporal consistency in demonstrated trajectories.
We also introduce a novel integration scheme for action priors in off-policy reinforcement learning.
- Score: 33.512849582347734
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Efficient exploration is a crucial challenge in deep reinforcement learning.
Several methods, such as behavioral priors, are able to leverage offline data
in order to efficiently accelerate reinforcement learning on complex tasks.
However, if the task at hand deviates excessively from the demonstrated task,
the effectiveness of such methods is limited. In our work, we propose to learn
features from offline data that are shared by a more diverse range of tasks,
such as correlation between actions and directedness. Therefore, we introduce
state-independent temporal priors, which directly model temporal consistency in
demonstrated trajectories, and are capable of driving exploration in complex
tasks, even when trained on data collected on simpler tasks. Furthermore, we
introduce a novel integration scheme for action priors in off-policy
reinforcement learning by dynamically sampling actions from a probabilistic
mixture of policy and action prior. We compare our approach against strong
baselines and provide empirical evidence that it can accelerate reinforcement
learning in long-horizon continuous control tasks under sparse reward settings.
Related papers
- State-Novelty Guided Action Persistence in Deep Reinforcement Learning [7.05832012052375]
We propose a novel method to dynamically adjust the action persistence based on the current exploration status of the state space.
Our method can be seamlessly integrated into various basic exploration strategies to incorporate temporal persistence.
arXiv Detail & Related papers (2024-09-09T08:34:22Z) - Excluding the Irrelevant: Focusing Reinforcement Learning through Continuous Action Masking [7.590209768166108]
We introduce three continuous action masking methods for mapping the action space to the state-dependent set of relevant actions.
Our methods ensure that only relevant actions are executed, enhancing the predictability of the RL agent and enabling its use in safety-critical applications.
arXiv Detail & Related papers (2024-06-06T02:55:16Z) - The Effect of Task Ordering in Continual Learning [12.571389210876315]
We show that reordering tasks significantly affects the amount of catastrophic forgetting.
We show that the effect of task ordering can be exploited to modify continual learning performance.
arXiv Detail & Related papers (2022-05-26T12:56:15Z) - Skill-based Meta-Reinforcement Learning [65.31995608339962]
We devise a method that enables meta-learning on long-horizon, sparse-reward tasks.
Our core idea is to leverage prior experience extracted from offline datasets during meta-learning.
arXiv Detail & Related papers (2022-04-25T17:58:19Z) - Relational Experience Replay: Continual Learning by Adaptively Tuning
Task-wise Relationship [54.73817402934303]
We propose Experience Continual Replay (ERR), a bi-level learning framework to adaptively tune task-wise to achieve a better stability plasticity' tradeoff.
ERR can consistently improve the performance of all baselines and surpass current state-of-the-art methods.
arXiv Detail & Related papers (2021-12-31T12:05:22Z) - Parrot: Data-Driven Behavioral Priors for Reinforcement Learning [79.32403825036792]
We propose a method for pre-training behavioral priors that can capture complex input-output relationships observed in successful trials.
We show how this learned prior can be used for rapidly learning new tasks without impeding the RL agent's ability to try out novel behaviors.
arXiv Detail & Related papers (2020-11-19T18:47:40Z) - Importance Weighted Policy Learning and Adaptation [89.46467771037054]
We study a complementary approach which is conceptually simple, general, modular and built on top of recent improvements in off-policy learning.
The framework is inspired by ideas from the probabilistic inference literature and combines robust off-policy learning with a behavior prior.
Our approach achieves competitive adaptation performance on hold-out tasks compared to meta reinforcement learning baselines and can scale to complex sparse-reward scenarios.
arXiv Detail & Related papers (2020-09-10T14:16:58Z) - Meta-Reinforcement Learning Robust to Distributional Shift via Model
Identification and Experience Relabeling [126.69933134648541]
We present a meta-reinforcement learning algorithm that is both efficient and extrapolates well when faced with out-of-distribution tasks at test time.
Our method is based on a simple insight: we recognize that dynamics models can be adapted efficiently and consistently with off-policy data.
arXiv Detail & Related papers (2020-06-12T13:34:46Z) - Planning to Explore via Self-Supervised World Models [120.31359262226758]
Plan2Explore is a self-supervised reinforcement learning agent.
We present a new approach to self-supervised exploration and fast adaptation to new tasks.
Without any training supervision or task-specific interaction, Plan2Explore outperforms prior self-supervised exploration methods.
arXiv Detail & Related papers (2020-05-12T17:59:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.