Planning with Sequence Models through Iterative Energy Minimization
- URL: http://arxiv.org/abs/2303.16189v1
- Date: Tue, 28 Mar 2023 17:53:22 GMT
- Title: Planning with Sequence Models through Iterative Energy Minimization
- Authors: Hongyi Chen, Yilun Du, Yiye Chen, Joshua Tenenbaum, Patricio A. Vela
- Abstract summary: We suggest an approach towards integrating planning with sequence models based on the idea of iterative energy minimization.
We train a masked language model to capture an implicit energy function over trajectories of actions, and formulate planning as finding a trajectory of actions with minimum energy.
We illustrate how this procedure enables improved performance over recent approaches across BabyAI and Atari environments.
- Score: 22.594413287842574
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent works have shown that sequence modeling can be effectively used to
train reinforcement learning (RL) policies. However, the success of applying
existing sequence models to planning, in which we wish to obtain a trajectory
of actions to reach some goal, is less straightforward. The typical
autoregressive generation procedures of sequence models preclude sequential
refinement of earlier steps, which limits the effectiveness of a predicted
plan. In this paper, we suggest an approach towards integrating planning with
sequence models based on the idea of iterative energy minimization, and
illustrate how such a procedure leads to improved RL performance across
different tasks. We train a masked language model to capture an implicit energy
function over trajectories of actions, and formulate planning as finding a
trajectory of actions with minimum energy. We illustrate how this procedure
enables improved performance over recent approaches across BabyAI and Atari
environments. We further demonstrate unique benefits of our iterative
optimization procedure, involving new task generalization, test-time
constraints adaptation, and the ability to compose plans together. Project
website: https://hychen-naza.github.io/projects/LEAP
Related papers
- Closed-Loop Long-Horizon Robotic Planning via Equilibrium Sequence Modeling [23.62433580021779]
We advocate a self-refining scheme that iteratively refines a draft plan until an equilibrium is reached.
A nested equilibrium sequence modeling procedure is devised for efficient closed-loop planning.
Our method is evaluated on the VirtualHome-Env benchmark, showing advanced performance with better scaling for inference.
arXiv Detail & Related papers (2024-10-02T11:42:49Z) - SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation [52.6922833948127]
In this work, we investigate the importance of parameters in pre-trained diffusion models.
We propose a novel model fine-tuning method to make full use of these ineffective parameters.
Our method enhances the generative capabilities of pre-trained models in downstream applications.
arXiv Detail & Related papers (2024-09-10T16:44:47Z) - Adaptive Planning with Generative Models under Uncertainty [20.922248169620783]
Planning with generative models has emerged as an effective decision-making paradigm across a wide range of domains.
While continuous replanning at each timestep might seem intuitive because it allows decisions to be made based on the most recent environmental observations, it results in substantial computational challenges.
Our work addresses this challenge by introducing a simple adaptive planning policy that leverages the generative model's ability to predict long-horizon state trajectories.
arXiv Detail & Related papers (2024-08-02T18:07:53Z) - Model-based Reinforcement Learning for Parameterized Action Spaces [11.94388805327713]
We propose a novel model-based reinforcement learning algorithm for PAMDPs.
The agent learns a parameterized-action-conditioned dynamics model and plans with a modified Model Predictive Path Integral control.
Our empirical results on several standard benchmarks show that our algorithm achieves superior sample efficiency and performance than state-of-the-art PAMDP methods.
arXiv Detail & Related papers (2024-04-03T19:48:13Z) - Guiding Language Model Reasoning with Planning Tokens [122.43639723387516]
Large language models (LLMs) have recently attracted considerable interest for their ability to perform complex reasoning tasks.
We propose a hierarchical generation scheme to encourage a more structural generation of chain-of-thought steps.
Our approach requires a negligible increase in trainable parameters (0.001%) and can be applied through either full fine-tuning or a more parameter-efficient scheme.
arXiv Detail & Related papers (2023-10-09T13:29:37Z) - Theoretically Guaranteed Policy Improvement Distilled from Model-Based
Planning [64.10794426777493]
Model-based reinforcement learning (RL) has demonstrated remarkable successes on a range of continuous control tasks.
Recent practices tend to distill optimized action sequences into an RL policy during the training phase.
We develop an approach to distill from model-based planning to the policy.
arXiv Detail & Related papers (2023-07-24T16:52:31Z) - Imitating Graph-Based Planning with Goal-Conditioned Policies [72.61631088613048]
We present a self-imitation scheme which distills a subgoal-conditioned policy into the target-goal-conditioned policy.
We empirically show that our method can significantly boost the sample-efficiency of the existing goal-conditioned RL methods.
arXiv Detail & Related papers (2023-03-20T14:51:10Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - Model-Based Reinforcement Learning via Latent-Space Collocation [110.04005442935828]
We argue that it is easier to solve long-horizon tasks by planning sequences of states rather than just actions.
We adapt the idea of collocation, which has shown good results on long-horizon tasks in optimal control literature, to the image-based setting by utilizing learned latent state space models.
arXiv Detail & Related papers (2021-06-24T17:59:18Z) - Model-Predictive Control via Cross-Entropy and Gradient-Based
Optimization [26.497575737219794]
Cross-Entropy Method (CEM) is a population-based optimization method for planning a sequence of actions.
We propose a method to solve this problem by interleaving CEM and gradient descent steps in optimizing the action sequence.
Our experiments show faster convergence of the proposed hybrid approach, even for high-dimensional action spaces.
arXiv Detail & Related papers (2020-04-19T03:54:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.