Related papers: Model-Based Reinforcement Learning via Latent-Space Collocation

Model-Based Reinforcement Learning via Latent-Space Collocation

URL: http://arxiv.org/abs/2106.13229v1
Date: Thu, 24 Jun 2021 17:59:18 GMT
Title: Model-Based Reinforcement Learning via Latent-Space Collocation
Authors: Oleh Rybkin, Chuning Zhu, Anusha Nagabandi, Kostas Daniilidis, Igor Mordatch, Sergey Levine
Abstract summary: We argue that it is easier to solve long-horizon tasks by planning sequences of states rather than just actions. We adapt the idea of collocation, which has shown good results on long-horizon tasks in optimal control literature, to the image-based setting by utilizing learned latent state space models.
Score: 110.04005442935828
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The ability to plan into the future while utilizing only raw high-dimensional observations, such as images, can provide autonomous agents with broad capabilities. Visual model-based reinforcement learning (RL) methods that plan future actions directly have shown impressive results on tasks that require only short-horizon reasoning, however, these methods struggle on temporally extended tasks. We argue that it is easier to solve long-horizon tasks by planning sequences of states rather than just actions, as the effects of actions greatly compound over time and are harder to optimize. To achieve this, we draw on the idea of collocation, which has shown good results on long-horizon tasks in optimal control literature, and adapt it to the image-based setting by utilizing learned latent state space models. The resulting latent collocation method (LatCo) optimizes trajectories of latent states, which improves over previously proposed shooting methods for visual model-based RL on tasks with sparse rewards and long-term goals. Videos and code at https://orybkin.github.io/latco/.

Related papers

Latent Diffusion Planning for Imitation Learning [78.56207566743154]
Latent Diffusion Planning (LDP) is a modular approach consisting of a planner and inverse dynamics model. By separating planning from action prediction, LDP can benefit from the denser supervision signals of suboptimal and action-free data. On simulated visual robotic manipulation tasks, LDP outperforms state-of-the-art imitation learning approaches.
arXiv Detail & Related papers (2025-04-23T17:53:34Z)
Open-World Reinforcement Learning over Long Short-Term Imagination [91.28998327423295]
We present LS-Imagine, which extends the imagination horizon within a limited number of state transition steps. Our method demonstrates significant improvements over state-of-the-art techniques in MineDojo.
arXiv Detail & Related papers (2024-10-04T17:17:30Z)
Diffused Task-Agnostic Milestone Planner [13.042155799536657]
We propose a method to utilize a diffusion-based generative sequence model to plan a series of milestones in a latent space. The proposed method can learn control-relevant, low-dimensional latent representations of milestones, which makes it possible to efficiently perform long-term planning and vision-based control.
arXiv Detail & Related papers (2023-12-06T10:09:22Z)
Sample-efficient Real-time Planning with Curiosity Cross-Entropy Method and Contrastive Learning [21.995159117991278]
We propose Curiosity CEM, an improved version of the Cross-Entropy Method (CEM) algorithm for encouraging exploration via curiosity. Our proposed method maximizes the sum of state-action Q values over the planning horizon, in which these Q values estimate the future extrinsic and intrinsic reward. Experiments on image-based continuous control tasks from the DeepMind Control suite show that CCEM is by a large margin more sample-efficient than previous MBRL algorithms.
arXiv Detail & Related papers (2023-03-07T10:48:20Z)
Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies with One Objective [142.36200080384145]
We propose a single objective which jointly optimize a latent-space model and policy to achieve high returns while remaining self-consistent. We demonstrate that the resulting algorithm matches or improves the sample-efficiency of the best prior model-based and model-free RL methods.
arXiv Detail & Related papers (2022-09-18T03:51:58Z)
Skill-based Model-based Reinforcement Learning [18.758245582997656]
Model-based reinforcement learning (RL) is a sample-efficient way of learning complex behaviors. We propose a Skill-based Model-based RL framework (SkiMo) that enables planning in the skill space. We harness the learned skill dynamics model to accurately simulate and plan over long horizons in the skill space.
arXiv Detail & Related papers (2022-07-15T16:06:33Z)
Waypoint Models for Instruction-guided Navigation in Continuous Environments [68.2912740006109]
We develop a class of language-conditioned waypoint prediction networks to examine this question. We measure task performance and estimated execution time on a profiled LoCoBot robot. Our models outperform prior work in VLN-CE and set a new state-of-the-art on the public leaderboard.
arXiv Detail & Related papers (2021-10-05T17:55:49Z)
Learning Long-term Visual Dynamics with Region Proposal Interaction Networks [75.06423516419862]
We build object representations that can capture inter-object and object-environment interactions over a long-range. Thanks to the simple yet effective object representation, our approach outperforms prior methods by a significant margin.
arXiv Detail & Related papers (2020-08-05T17:48:00Z)
Long-Horizon Visual Planning with Goal-Conditioned Hierarchical Predictors [124.30562402952319]
The ability to predict and plan into the future is fundamental for agents acting in the world. Current learning approaches for visual prediction and planning fail on long-horizon tasks. We propose a framework for visual prediction and planning that is able to overcome both of these limitations.
arXiv Detail & Related papers (2020-06-23T17:58:56Z)
PlanGAN: Model-based Planning With Sparse Rewards and Multiple Goals [14.315501760755609]
PlanGAN is a model-based algorithm for solving multi-goal tasks in environments with sparse rewards. Our studies indicate that PlanGAN can achieve comparable performance whilst being around 4-8 times more sample efficient.
arXiv Detail & Related papers (2020-06-01T12:53:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.