Learning Goal-Conditioned Policies Offline with Self-Supervised Reward
Shaping
- URL: http://arxiv.org/abs/2301.02099v1
- Date: Thu, 5 Jan 2023 15:07:10 GMT
- Title: Learning Goal-Conditioned Policies Offline with Self-Supervised Reward
Shaping
- Authors: Lina Mezghani, Sainbayar Sukhbaatar, Piotr Bojanowski, Alessandro
Lazaric, Karteek Alahari
- Abstract summary: We propose a novel self-supervised learning phase on the pre-collected dataset to understand the structure and the dynamics of the model.
We evaluate our method on three continuous control tasks, and show that our model significantly outperforms existing approaches.
- Score: 94.89128390954572
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Developing agents that can execute multiple skills by learning from
pre-collected datasets is an important problem in robotics, where online
interaction with the environment is extremely time-consuming. Moreover,
manually designing reward functions for every single desired skill is
prohibitive. Prior works targeted these challenges by learning goal-conditioned
policies from offline datasets without manually specified rewards, through
hindsight relabelling. These methods suffer from the issue of sparsity of
rewards, and fail at long-horizon tasks. In this work, we propose a novel
self-supervised learning phase on the pre-collected dataset to understand the
structure and the dynamics of the model, and shape a dense reward function for
learning policies offline. We evaluate our method on three continuous control
tasks, and show that our model significantly outperforms existing approaches,
especially on tasks that involve long-term planning.
Related papers
- Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization [51.34904967046097]
Continual learning seeks to overcome the challenge of catastrophic forgetting, where a model forgets previously learnt information.
We introduce a novel prior-based method that better constrains parameter growth, reducing catastrophic forgetting.
Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments.
arXiv Detail & Related papers (2023-09-15T17:10:51Z) - Self-Supervised Reinforcement Learning that Transfers using Random
Features [41.00256493388967]
We propose a self-supervised reinforcement learning method that enables the transfer of behaviors across tasks with different rewards.
Our method is self-supervised in that it can be trained on offline datasets without reward labels, but can then be quickly deployed on new tasks.
arXiv Detail & Related papers (2023-05-26T20:37:06Z) - Learning and Retrieval from Prior Data for Skill-based Imitation
Learning [47.59794569496233]
We develop a skill-based imitation learning framework that extracts temporally extended sensorimotor skills from prior data.
We identify several key design choices that significantly improve performance on novel tasks.
arXiv Detail & Related papers (2022-10-20T17:34:59Z) - Generalization with Lossy Affordances: Leveraging Broad Offline Data for
Learning Visuomotor Tasks [65.23947618404046]
We introduce a framework that acquires goal-conditioned policies for unseen temporally extended tasks via offline reinforcement learning on broad data.
When faced with a novel task goal, the framework uses an affordance model to plan a sequence of lossy representations as subgoals that decomposes the original task into easier problems.
We show that our framework can be pre-trained on large-scale datasets of robot experiences from prior work and efficiently fine-tuned for novel tasks, entirely from visual inputs without any manual reward engineering.
arXiv Detail & Related papers (2022-10-12T21:46:38Z) - Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in
Latent Space [76.46113138484947]
General-purpose robots require diverse repertoires of behaviors to complete challenging tasks in real-world unstructured environments.
To address this issue, goal-conditioned reinforcement learning aims to acquire policies that can reach goals for a wide range of tasks on command.
We propose Planning to Practice, a method that makes it practical to train goal-conditioned policies for long-horizon tasks.
arXiv Detail & Related papers (2022-05-17T06:58:17Z) - Skill-based Meta-Reinforcement Learning [65.31995608339962]
We devise a method that enables meta-learning on long-horizon, sparse-reward tasks.
Our core idea is to leverage prior experience extracted from offline datasets during meta-learning.
arXiv Detail & Related papers (2022-04-25T17:58:19Z) - PLAS: Latent Action Space for Offline Reinforcement Learning [18.63424441772675]
The goal of offline reinforcement learning is to learn a policy from a fixed dataset, without further interactions with the environment.
Existing off-policy algorithms have limited performance on static datasets due to extrapolation errors from out-of-distribution actions.
We demonstrate that our method provides competitive performance consistently across various continuous control tasks and different types of datasets.
arXiv Detail & Related papers (2020-11-14T03:38:38Z) - Goal-Aware Prediction: Learning to Model What Matters [105.43098326577434]
One of the fundamental challenges in using a learned forward dynamics model is the mismatch between the objective of the learned model and that of the downstream planner or policy.
We propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space.
We find that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning.
arXiv Detail & Related papers (2020-07-14T16:42:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.