Goal-conditioned Offline Planning from Curious Exploration
- URL: http://arxiv.org/abs/2311.16996v1
- Date: Tue, 28 Nov 2023 17:48:18 GMT
- Title: Goal-conditioned Offline Planning from Curious Exploration
- Authors: Marco Bagatella, Georg Martius
- Abstract summary: We consider the challenge of extracting goal-conditioned behavior from the products of unsupervised exploration techniques.
We find that conventional goal-conditioned reinforcement learning approaches for extracting a value function and policy fall short in this difficult offline setting.
In order to mitigate their occurrence, we propose to combine model-based planning over learned value landscapes with a graph-based value aggregation scheme.
- Score: 28.953718733443143
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Curiosity has established itself as a powerful exploration strategy in deep
reinforcement learning. Notably, leveraging expected future novelty as
intrinsic motivation has been shown to efficiently generate exploratory
trajectories, as well as a robust dynamics model. We consider the challenge of
extracting goal-conditioned behavior from the products of such unsupervised
exploration techniques, without any additional environment interaction. We find
that conventional goal-conditioned reinforcement learning approaches for
extracting a value function and policy fall short in this difficult offline
setting. By analyzing the geometry of optimal goal-conditioned value functions,
we relate this issue to a specific class of estimation artifacts in learned
values. In order to mitigate their occurrence, we propose to combine
model-based planning over learned value landscapes with a graph-based value
aggregation scheme. We show how this combination can correct both local and
global artifacts, obtaining significant improvements in zero-shot goal-reaching
performance across diverse simulated environments.
Related papers
- A Bayesian Approach to Robust Inverse Reinforcement Learning [54.24816623644148]
We consider a Bayesian approach to offline model-based inverse reinforcement learning (IRL)
The proposed framework differs from existing offline model-based IRL approaches by performing simultaneous estimation of the expert's reward function and subjective model of environment dynamics.
Our analysis reveals a novel insight that the estimated policy exhibits robust performance when the expert is believed to have a highly accurate model of the environment.
arXiv Detail & Related papers (2023-09-15T17:37:09Z) - Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces.
We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories.
We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z) - When Demonstrations Meet Generative World Models: A Maximum Likelihood
Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent.
Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z) - Implicit Training of Energy Model for Structure Prediction [14.360826930970765]
In this work, we argue that the existing inference network based structure prediction methods are indirectly learning to optimize a dynamic loss objective parameterized by the energy model.
We then explore using implicit-gradient based technique to learn the corresponding dynamic objectives.
arXiv Detail & Related papers (2022-11-21T17:08:44Z) - Discrete Factorial Representations as an Abstraction for Goal
Conditioned Reinforcement Learning [99.38163119531745]
We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups.
We experimentally prove the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive structure.
arXiv Detail & Related papers (2022-11-01T03:31:43Z) - Online reinforcement learning with sparse rewards through an active
inference capsule [62.997667081978825]
This paper introduces an active inference agent which minimizes the novel free energy of the expected future.
Our model is capable of solving sparse-reward problems with a very high sample efficiency.
We also introduce a novel method for approximating the prior model from the reward function, which simplifies the expression of complex objectives.
arXiv Detail & Related papers (2021-06-04T10:03:36Z) - Provable Representation Learning for Imitation with Contrastive Fourier
Features [27.74988221252854]
We consider using offline experience datasets to learn low-dimensional state representations.
A central challenge is that the unknown target policy itself may not exhibit low-dimensional behavior.
We derive a representation learning objective which provides an upper bound on the performance difference between the target policy and a lowdimensional policy trained with max-likelihood.
arXiv Detail & Related papers (2021-05-26T00:31:30Z) - Variational Dynamic for Self-Supervised Exploration in Deep Reinforcement Learning [12.76337275628074]
In this work, we propose a variational dynamic model based on the conditional variational inference to model the multimodality andgenerativeity.
We derive an upper bound of the negative log-likelihood of the environmental transition and use such an upper bound as the intrinsic reward for exploration.
Our method outperforms several state-of-the-art environment model-based exploration approaches.
arXiv Detail & Related papers (2020-10-17T09:54:51Z) - Goal-Aware Prediction: Learning to Model What Matters [105.43098326577434]
One of the fundamental challenges in using a learned forward dynamics model is the mismatch between the objective of the learned model and that of the downstream planner or policy.
We propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space.
We find that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning.
arXiv Detail & Related papers (2020-07-14T16:42:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.