Model-Based Reinforcement Learning via Latent-Space Collocation
- URL: http://arxiv.org/abs/2106.13229v1
- Date: Thu, 24 Jun 2021 17:59:18 GMT
- Title: Model-Based Reinforcement Learning via Latent-Space Collocation
- Authors: Oleh Rybkin, Chuning Zhu, Anusha Nagabandi, Kostas Daniilidis, Igor
Mordatch, Sergey Levine
- Abstract summary: We argue that it is easier to solve long-horizon tasks by planning sequences of states rather than just actions.
We adapt the idea of collocation, which has shown good results on long-horizon tasks in optimal control literature, to the image-based setting by utilizing learned latent state space models.
- Score: 110.04005442935828
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The ability to plan into the future while utilizing only raw high-dimensional
observations, such as images, can provide autonomous agents with broad
capabilities. Visual model-based reinforcement learning (RL) methods that plan
future actions directly have shown impressive results on tasks that require
only short-horizon reasoning, however, these methods struggle on temporally
extended tasks. We argue that it is easier to solve long-horizon tasks by
planning sequences of states rather than just actions, as the effects of
actions greatly compound over time and are harder to optimize. To achieve this,
we draw on the idea of collocation, which has shown good results on
long-horizon tasks in optimal control literature, and adapt it to the
image-based setting by utilizing learned latent state space models. The
resulting latent collocation method (LatCo) optimizes trajectories of latent
states, which improves over previously proposed shooting methods for visual
model-based RL on tasks with sparse rewards and long-term goals. Videos and
code at https://orybkin.github.io/latco/.
Related papers
- Open-World Reinforcement Learning over Long Short-Term Imagination [91.28998327423295]
We present LS-Imagine, which extends the imagination horizon within a limited number of state transition steps.
Our method demonstrates significant improvements over state-of-the-art techniques in MineDojo.
arXiv Detail & Related papers (2024-10-04T17:17:30Z) - Diffused Task-Agnostic Milestone Planner [13.042155799536657]
We propose a method to utilize a diffusion-based generative sequence model to plan a series of milestones in a latent space.
The proposed method can learn control-relevant, low-dimensional latent representations of milestones, which makes it possible to efficiently perform long-term planning and vision-based control.
arXiv Detail & Related papers (2023-12-06T10:09:22Z) - Sample-efficient Real-time Planning with Curiosity Cross-Entropy Method
and Contrastive Learning [21.995159117991278]
We propose Curiosity CEM, an improved version of the Cross-Entropy Method (CEM) algorithm for encouraging exploration via curiosity.
Our proposed method maximizes the sum of state-action Q values over the planning horizon, in which these Q values estimate the future extrinsic and intrinsic reward.
Experiments on image-based continuous control tasks from the DeepMind Control suite show that CCEM is by a large margin more sample-efficient than previous MBRL algorithms.
arXiv Detail & Related papers (2023-03-07T10:48:20Z) - Simplifying Model-based RL: Learning Representations, Latent-space
Models, and Policies with One Objective [142.36200080384145]
We propose a single objective which jointly optimize a latent-space model and policy to achieve high returns while remaining self-consistent.
We demonstrate that the resulting algorithm matches or improves the sample-efficiency of the best prior model-based and model-free RL methods.
arXiv Detail & Related papers (2022-09-18T03:51:58Z) - Skill-based Model-based Reinforcement Learning [18.758245582997656]
Model-based reinforcement learning (RL) is a sample-efficient way of learning complex behaviors.
We propose a Skill-based Model-based RL framework (SkiMo) that enables planning in the skill space.
We harness the learned skill dynamics model to accurately simulate and plan over long horizons in the skill space.
arXiv Detail & Related papers (2022-07-15T16:06:33Z) - Waypoint Models for Instruction-guided Navigation in Continuous
Environments [68.2912740006109]
We develop a class of language-conditioned waypoint prediction networks to examine this question.
We measure task performance and estimated execution time on a profiled LoCoBot robot.
Our models outperform prior work in VLN-CE and set a new state-of-the-art on the public leaderboard.
arXiv Detail & Related papers (2021-10-05T17:55:49Z) - Learning Long-term Visual Dynamics with Region Proposal Interaction
Networks [75.06423516419862]
We build object representations that can capture inter-object and object-environment interactions over a long-range.
Thanks to the simple yet effective object representation, our approach outperforms prior methods by a significant margin.
arXiv Detail & Related papers (2020-08-05T17:48:00Z) - Long-Horizon Visual Planning with Goal-Conditioned Hierarchical
Predictors [124.30562402952319]
The ability to predict and plan into the future is fundamental for agents acting in the world.
Current learning approaches for visual prediction and planning fail on long-horizon tasks.
We propose a framework for visual prediction and planning that is able to overcome both of these limitations.
arXiv Detail & Related papers (2020-06-23T17:58:56Z) - PlanGAN: Model-based Planning With Sparse Rewards and Multiple Goals [14.315501760755609]
PlanGAN is a model-based algorithm for solving multi-goal tasks in environments with sparse rewards.
Our studies indicate that PlanGAN can achieve comparable performance whilst being around 4-8 times more sample efficient.
arXiv Detail & Related papers (2020-06-01T12:53:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.