Planning with Diffusion for Flexible Behavior Synthesis
- URL: http://arxiv.org/abs/2205.09991v1
- Date: Fri, 20 May 2022 07:02:03 GMT
- Title: Planning with Diffusion for Flexible Behavior Synthesis
- Authors: Michael Janner, Yilun Du, Joshua B. Tenenbaum, Sergey Levine
- Abstract summary: We consider what it would look like to fold as much of the trajectory optimization pipeline as possible into the modeling problem.
The core of our technical approach lies in a diffusion probabilistic model that plans by iteratively denoising trajectories.
- Score: 125.24438991142573
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model-based reinforcement learning methods often use learning only for the
purpose of estimating an approximate dynamics model, offloading the rest of the
decision-making work to classical trajectory optimizers. While conceptually
simple, this combination has a number of empirical shortcomings, suggesting
that learned models may not be well-suited to standard trajectory optimization.
In this paper, we consider what it would look like to fold as much of the
trajectory optimization pipeline as possible into the modeling problem, such
that sampling from the model and planning with it become nearly identical. The
core of our technical approach lies in a diffusion probabilistic model that
plans by iteratively denoising trajectories. We show how classifier-guided
sampling and image inpainting can be reinterpreted as coherent planning
strategies, explore the unusual and useful properties of diffusion-based
planning methods, and demonstrate the effectiveness of our framework in control
settings that emphasize long-horizon decision-making and test-time flexibility.
Related papers
- Bridging Model-Based Optimization and Generative Modeling via Conservative Fine-Tuning of Diffusion Models [54.132297393662654]
We introduce a hybrid method that fine-tunes cutting-edge diffusion models by optimizing reward models through RL.
We demonstrate the capability of our approach to outperform the best designs in offline data, leveraging the extrapolation capabilities of reward models.
arXiv Detail & Related papers (2024-05-30T03:57:29Z) - Deep Generative Models for Decision-Making and Control [4.238809918521607]
The dual purpose of this thesis is to study the reasons for these shortcomings and to propose solutions for the uncovered problems.
We highlight how inference techniques from the contemporary generative modeling toolbox, including beam search, can be reinterpreted as viable planning strategies for reinforcement learning problems.
arXiv Detail & Related papers (2023-06-15T01:54:30Z) - Latent Variable Representation for Reinforcement Learning [131.03944557979725]
It remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of model-based reinforcement learning.
We provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle.
In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models.
arXiv Detail & Related papers (2022-12-17T00:26:31Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - Predictive Control Using Learned State Space Models via Rolling Horizon
Evolution [2.1016374925364616]
In this paper, we explore this theme combining evolutionary algorithmic planning techniques with models learned via deep learning and variational inference.
We demonstrate the approach with an agent that reliably performs online planning in a set of visual navigation tasks.
arXiv Detail & Related papers (2021-06-25T23:23:42Z) - Experimental Design for Overparameterized Learning with Application to
Single Shot Deep Active Learning [5.141687309207561]
Modern machine learning models are trained on large amounts of labeled data.
Access to large volumes of labeled data is often limited or expensive.
We propose a new design strategy for curating the training set.
arXiv Detail & Related papers (2020-09-27T11:27:49Z) - Prediction-Centric Learning of Independent Cascade Dynamics from Partial
Observations [13.680949377743392]
We address the problem of learning of a spreading model such that the predictions generated from this model are accurate.
We introduce a computationally efficient algorithm, based on a scalable dynamic message-passing approach.
We show that tractable inference from the learned model generates a better prediction of marginal probabilities compared to the original model.
arXiv Detail & Related papers (2020-07-13T17:58:21Z) - Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference.
We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z) - Model-Augmented Actor-Critic: Backpropagating through Paths [81.86992776864729]
Current model-based reinforcement learning approaches use the model simply as a learned black-box simulator.
We show how to make more effective use of the model by exploiting its differentiability.
arXiv Detail & Related papers (2020-05-16T19:18:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.