Planning with Sequence Models through Iterative Energy Minimization
- URL: http://arxiv.org/abs/2303.16189v1
- Date: Tue, 28 Mar 2023 17:53:22 GMT
- Title: Planning with Sequence Models through Iterative Energy Minimization
- Authors: Hongyi Chen, Yilun Du, Yiye Chen, Joshua Tenenbaum, Patricio A. Vela
- Abstract summary: We suggest an approach towards integrating planning with sequence models based on the idea of iterative energy minimization.
We train a masked language model to capture an implicit energy function over trajectories of actions, and formulate planning as finding a trajectory of actions with minimum energy.
We illustrate how this procedure enables improved performance over recent approaches across BabyAI and Atari environments.
- Score: 22.594413287842574
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent works have shown that sequence modeling can be effectively used to
train reinforcement learning (RL) policies. However, the success of applying
existing sequence models to planning, in which we wish to obtain a trajectory
of actions to reach some goal, is less straightforward. The typical
autoregressive generation procedures of sequence models preclude sequential
refinement of earlier steps, which limits the effectiveness of a predicted
plan. In this paper, we suggest an approach towards integrating planning with
sequence models based on the idea of iterative energy minimization, and
illustrate how such a procedure leads to improved RL performance across
different tasks. We train a masked language model to capture an implicit energy
function over trajectories of actions, and formulate planning as finding a
trajectory of actions with minimum energy. We illustrate how this procedure
enables improved performance over recent approaches across BabyAI and Atari
environments. We further demonstrate unique benefits of our iterative
optimization procedure, involving new task generalization, test-time
constraints adaptation, and the ability to compose plans together. Project
website: https://hychen-naza.github.io/projects/LEAP
Related papers
- Bridging Model-Based Optimization and Generative Modeling via Conservative Fine-Tuning of Diffusion Models [54.132297393662654]
We introduce a hybrid method that fine-tunes cutting-edge diffusion models by optimizing reward models through RL.
We demonstrate the capability of our approach to outperform the best designs in offline data, leveraging the extrapolation capabilities of reward models.
arXiv Detail & Related papers (2024-05-30T03:57:29Z) - Model-based Reinforcement Learning for Parameterized Action Spaces [11.94388805327713]
We propose a novel model-based reinforcement learning algorithm for PAMDPs.
The agent learns a parameterized-action-conditioned dynamics model and plans with a modified Model Predictive Path Integral control.
Our empirical results on several standard benchmarks show that our algorithm achieves superior sample efficiency and performance than state-of-the-art PAMDP methods.
arXiv Detail & Related papers (2024-04-03T19:48:13Z) - Locally Optimal Descent for Dynamic Stepsize Scheduling [45.6809308002043]
We introduce a novel dynamic learning scheduling scheme grounded in theory with the goal of simplifying the manual and time-consuming tuning of schedules in phases.
Our approach is based on estimating a locally-optimal practice tuning-rate in the direction of a smooth gradient.
Our findings indicate that our method needs minimal tuning when compared to existing approaches.
arXiv Detail & Related papers (2023-11-23T09:57:35Z) - Theoretically Guaranteed Policy Improvement Distilled from Model-Based
Planning [64.10794426777493]
Model-based reinforcement learning (RL) has demonstrated remarkable successes on a range of continuous control tasks.
Recent practices tend to distill optimized action sequences into an RL policy during the training phase.
We develop an approach to distill from model-based planning to the policy.
arXiv Detail & Related papers (2023-07-24T16:52:31Z) - Imitating Graph-Based Planning with Goal-Conditioned Policies [72.61631088613048]
We present a self-imitation scheme which distills a subgoal-conditioned policy into the target-goal-conditioned policy.
We empirically show that our method can significantly boost the sample-efficiency of the existing goal-conditioned RL methods.
arXiv Detail & Related papers (2023-03-20T14:51:10Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - Position Paper: Online Modeling for Offline Planning [2.8326418377665346]
A key part of AI planning research is the representation of action models.
Despite the maturity of the field, AI planning technology is still rarely used outside the research community.
We argue that this is because the modeling process is assumed to have taken place and completed prior to the planning process.
arXiv Detail & Related papers (2022-06-07T14:48:08Z) - Model-Based Reinforcement Learning via Latent-Space Collocation [110.04005442935828]
We argue that it is easier to solve long-horizon tasks by planning sequences of states rather than just actions.
We adapt the idea of collocation, which has shown good results on long-horizon tasks in optimal control literature, to the image-based setting by utilizing learned latent state space models.
arXiv Detail & Related papers (2021-06-24T17:59:18Z) - Model-Predictive Control via Cross-Entropy and Gradient-Based
Optimization [26.497575737219794]
Cross-Entropy Method (CEM) is a population-based optimization method for planning a sequence of actions.
We propose a method to solve this problem by interleaving CEM and gradient descent steps in optimizing the action sequence.
Our experiments show faster convergence of the proposed hybrid approach, even for high-dimensional action spaces.
arXiv Detail & Related papers (2020-04-19T03:54:50Z) - STRIPS Action Discovery [67.73368413278631]
Recent approaches have shown the success of classical planning at synthesizing action models even when all intermediate states are missing.
We propose a new algorithm to unsupervisedly synthesize STRIPS action models with a classical planner when action signatures are unknown.
arXiv Detail & Related papers (2020-01-30T17:08:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.