Related papers: Latent Plan Transformer for Trajectory Abstraction: Planning as Latent Space Inference

Latent Plan Transformer for Trajectory Abstraction: Planning as Latent Space Inference

URL: http://arxiv.org/abs/2402.04647v3
Date: Thu, 31 Oct 2024 07:56:57 GMT
Title: Latent Plan Transformer for Trajectory Abstraction: Planning as Latent Space Inference
Authors: Deqian Kong, Dehong Xu, Minglu Zhao, Bo Pang, Jianwen Xie, Andrew Lizarraga, Yuhao Huang, Sirui Xie, Ying Nian Wu,
Abstract summary: We study generative modeling for planning with datasets repurposed from offline reinforcement learning. We introduce the Latent Plan Transformer (), a novel model that leverages a latent variable to connect a Transformer-based trajectory generator and the final return.
Score: 53.419249906014194
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In tasks aiming for long-term returns, planning becomes essential. We study generative modeling for planning with datasets repurposed from offline reinforcement learning. Specifically, we identify temporal consistency in the absence of step-wise rewards as one key technical challenge. We introduce the Latent Plan Transformer (LPT), a novel model that leverages a latent variable to connect a Transformer-based trajectory generator and the final return. LPT can be learned with maximum likelihood estimation on trajectory-return pairs. In learning, posterior sampling of the latent variable naturally integrates sub-trajectories to form a consistent abstraction despite the finite context. At test time, the latent variable is inferred from an expected return before policy execution, realizing the idea of planning as inference. Our experiments demonstrate that LPT can discover improved decisions from sub-optimal trajectories, achieving competitive performance across several benchmarks, including Gym-Mujoco, Franka Kitchen, Maze2D, and Connect Four. It exhibits capabilities in nuanced credit assignments, trajectory stitching, and adaptation to environmental contingencies. These results validate that latent variable inference can be a strong alternative to step-wise reward prompting.

Related papers

Beyond Policy Optimization: A Data Curation Flywheel for Sparse-Reward Long-Horizon Planning [15.103861901247125]
We propose a three-stage framework to develop robust reasoning models for sparse environments.<n>Our framework bootstraps efficient reasoning using the proposed planning quaternions with long-short chain-of-thought fusion.<n>Experiments on ALFWorld, ScienceWorld, and WebShop demonstrate that our approach achieves state-of-the-art with significant token efficiency.
arXiv Detail & Related papers (2025-08-05T02:56:58Z)
Diffusion Meets Options: Hierarchical Generative Skill Composition for Temporally-Extended Tasks [12.239868705130178]
We propose a data-driven hierarchical framework that generates and updates plans based on instruction specified by linear temporal logic (LTL) Our method decomposes temporal tasks into chain of options with hierarchical reinforcement learning from offline non-expert datasets. We devise a determinantal-guided posterior sampling technique during batch generation, which improves the speed and diversity of diffusion generated options.
arXiv Detail & Related papers (2024-10-03T11:10:37Z)
Adaptive Planning with Generative Models under Uncertainty [20.922248169620783]
Planning with generative models has emerged as an effective decision-making paradigm across a wide range of domains. While continuous replanning at each timestep might seem intuitive because it allows decisions to be made based on the most recent environmental observations, it results in substantial computational challenges. Our work addresses this challenge by introducing a simple adaptive planning policy that leverages the generative model's ability to predict long-horizon state trajectories.
arXiv Detail & Related papers (2024-08-02T18:07:53Z)
Q-value Regularized Transformer for Offline Reinforcement Learning [70.13643741130899]
We propose a Q-value regularized Transformer (QT) to enhance the state-of-the-art in offline reinforcement learning (RL) QT learns an action-value function and integrates a term maximizing action-values into the training loss of Conditional Sequence Modeling (CSM) Empirical evaluations on D4RL benchmark datasets demonstrate the superiority of QT over traditional DP and CSM methods.
arXiv Detail & Related papers (2024-05-27T12:12:39Z)
Time-series Generation by Contrastive Imitation [87.51882102248395]
We study a generative framework that seeks to combine the strengths of both: Motivated by a moment-matching objective to mitigate compounding error, we optimize a local (but forward-looking) transition policy. At inference, the learned policy serves as the generator for iterative sampling, and the learned energy serves as a trajectory-level measure for evaluating sample quality.
arXiv Detail & Related papers (2023-11-02T16:45:25Z)
Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency [53.8779374188643]
We propose a principled framework with provable regret guarantees to orchestrate reasoning and acting. Specifically, we design a prompt template for reasoning that learns from the memory buffer and plans a future trajectory over a long horizon. At each step, the LLM agent takes the initial action of the planned trajectory ("act for now"), stores the collected feedback in the memory buffer, and reinvokes the reasoning routine to replan the future trajectory from the new state.
arXiv Detail & Related papers (2023-09-29T16:36:39Z)
Uncertainty-Aware Decision Transformer for Stochastic Driving Environments [34.78461208843929]
We introduce an UNcertainty-awaRESion Transformer (UNREST) for planning in driving environments. UNREST estimates uncertainties by conditional mutual information between transitions and returns. We replace the global returns in decision transformers with truncated returns less affected by environments to learn from actual outcomes.
arXiv Detail & Related papers (2023-09-28T12:44:51Z)
Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level Stability and High-Level Behavior [51.60683890503293]
We propose a theoretical framework for studying behavior cloning of complex expert demonstrations using generative modeling. We show that pure supervised cloning can generate trajectories matching the per-time step distribution of arbitrary expert trajectories.
arXiv Detail & Related papers (2023-07-27T04:27:26Z)
Supervised Pretraining Can Learn In-Context Reinforcement Learning [96.62869749926415]
In this paper, we study the in-context learning capabilities of transformers in decision-making problems. We introduce and study Decision-Pretrained Transformer (DPT), a supervised pretraining method where the transformer predicts an optimal action. We find that the pretrained transformer can be used to solve a range of RL problems in-context, exhibiting both exploration online and conservatism offline.
arXiv Detail & Related papers (2023-06-26T17:58:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.