Planning Transformer: Long-Horizon Offline Reinforcement Learning with Planning Tokens
- URL: http://arxiv.org/abs/2409.09513v1
- Date: Sat, 14 Sep 2024 19:30:53 GMT
- Title: Planning Transformer: Long-Horizon Offline Reinforcement Learning with Planning Tokens
- Authors: Joseph Clinton, Robert Lieck,
- Abstract summary: We introduce Planning Tokens, which contain high-level, long time-scale information about the agent's future.
We demonstrate that Planning Tokens improve the interpretability of the model's policy through the interpretable plan visualisations and attention map.
- Score: 1.8416014644193066
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Supervised learning approaches to offline reinforcement learning, particularly those utilizing the Decision Transformer, have shown effectiveness in continuous environments and for sparse rewards. However, they often struggle with long-horizon tasks due to the high compounding error of auto-regressive models. To overcome this limitation, we go beyond next-token prediction and introduce Planning Tokens, which contain high-level, long time-scale information about the agent's future. Predicting dual time-scale tokens at regular intervals enables our model to use these long-horizon Planning Tokens as a form of implicit planning to guide its low-level policy and reduce compounding error. This architectural modification significantly enhances performance on long-horizon tasks, establishing a new state-of-the-art in complex D4RL environments. Additionally, we demonstrate that Planning Tokens improve the interpretability of the model's policy through the interpretable plan visualisations and attention map.
Related papers
- Navigation with QPHIL: Quantizing Planner for Hierarchical Implicit Q-Learning [17.760679318994384]
We present a novel hierarchical transformer-based approach leveraging a learned quantizer of the space.
This quantization enables the training of a simpler zone-conditioned low-level policy and simplifies planning.
Our proposed approach achieves state-of-the-art results in complex long-distance navigation environments.
arXiv Detail & Related papers (2024-11-12T12:49:41Z) - Diffusion Meets Options: Hierarchical Generative Skill Composition for Temporally-Extended Tasks [12.239868705130178]
We propose a data-driven hierarchical framework that generates and updates plans based on instruction specified by linear temporal logic (LTL)
Our method decomposes temporal tasks into chain of options with hierarchical reinforcement learning from offline non-expert datasets.
We devise a determinantal-guided posterior sampling technique during batch generation, which improves the speed and diversity of diffusion generated options.
arXiv Detail & Related papers (2024-10-03T11:10:37Z) - Adaptive Planning with Generative Models under Uncertainty [20.922248169620783]
Planning with generative models has emerged as an effective decision-making paradigm across a wide range of domains.
While continuous replanning at each timestep might seem intuitive because it allows decisions to be made based on the most recent environmental observations, it results in substantial computational challenges.
Our work addresses this challenge by introducing a simple adaptive planning policy that leverages the generative model's ability to predict long-horizon state trajectories.
arXiv Detail & Related papers (2024-08-02T18:07:53Z) - Refining Diffusion Planner for Reliable Behavior Synthesis by Automatic
Detection of Infeasible Plans [25.326624139426514]
Diffusion-based planning has shown promising results in long-horizon, sparse-reward tasks.
However, due to their nature as generative models, diffusion models are not guaranteed to generate feasible plans.
We propose a novel approach to refine unreliable plans generated by diffusion models by providing refining guidance to error-prone plans.
arXiv Detail & Related papers (2023-10-30T10:35:42Z) - Compositional Foundation Models for Hierarchical Planning [52.18904315515153]
We propose a foundation model which leverages expert foundation model trained on language, vision and action data individually together to solve long-horizon tasks.
We use a large language model to construct symbolic plans that are grounded in the environment through a large video diffusion model.
Generated video plans are then grounded to visual-motor control, through an inverse dynamics model that infers actions from generated videos.
arXiv Detail & Related papers (2023-09-15T17:44:05Z) - Improving Long-Horizon Imitation Through Instruction Prediction [93.47416552953075]
In this work, we explore the use of an often unused source of auxiliary supervision: language.
Inspired by recent advances in transformer-based models, we train agents with an instruction prediction loss that encourages learning temporally extended representations that operate at a high level of abstraction.
In further analysis we find that instruction modeling is most important for tasks that require complex reasoning, while understandably offering smaller gains in environments that require simple plans.
arXiv Detail & Related papers (2023-06-21T20:47:23Z) - Efficient Learning of High Level Plans from Play [57.29562823883257]
We present Efficient Learning of High-Level Plans from Play (ELF-P), a framework for robotic learning that bridges motion planning and deep RL.
We demonstrate that ELF-P has significantly better sample efficiency than relevant baselines over multiple realistic manipulation tasks.
arXiv Detail & Related papers (2023-03-16T20:09:47Z) - Visual Learning-based Planning for Continuous High-Dimensional POMDPs [81.16442127503517]
Visual Tree Search (VTS) is a learning and planning procedure that combines generative models learned offline with online model-based POMDP planning.
VTS bridges offline model training and online planning by utilizing a set of deep generative observation models to predict and evaluate the likelihood of image observations in a Monte Carlo tree search planner.
We show that VTS is robust to different observation noises and, since it utilizes online, model-based planning, can adapt to different reward structures without the need to re-train.
arXiv Detail & Related papers (2021-12-17T11:53:31Z) - Model-Based Reinforcement Learning via Latent-Space Collocation [110.04005442935828]
We argue that it is easier to solve long-horizon tasks by planning sequences of states rather than just actions.
We adapt the idea of collocation, which has shown good results on long-horizon tasks in optimal control literature, to the image-based setting by utilizing learned latent state space models.
arXiv Detail & Related papers (2021-06-24T17:59:18Z) - Temporal Predictive Coding For Model-Based Planning In Latent Space [80.99554006174093]
We present an information-theoretic approach that employs temporal predictive coding to encode elements in the environment that can be predicted across time.
We evaluate our model on a challenging modification of standard DMControl tasks where the background is replaced with natural videos that contain complex but irrelevant information to the planning task.
arXiv Detail & Related papers (2021-06-14T04:31:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.