Diffused Task-Agnostic Milestone Planner
- URL: http://arxiv.org/abs/2312.03395v1
- Date: Wed, 6 Dec 2023 10:09:22 GMT
- Title: Diffused Task-Agnostic Milestone Planner
- Authors: Mineui Hong, Minjae Kang, Songhwai Oh
- Abstract summary: We propose a method to utilize a diffusion-based generative sequence model to plan a series of milestones in a latent space.
The proposed method can learn control-relevant, low-dimensional latent representations of milestones, which makes it possible to efficiently perform long-term planning and vision-based control.
- Score: 13.042155799536657
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Addressing decision-making problems using sequence modeling to predict future
trajectories shows promising results in recent years. In this paper, we take a
step further to leverage the sequence predictive method in wider areas such as
long-term planning, vision-based control, and multi-task decision-making. To
this end, we propose a method to utilize a diffusion-based generative sequence
model to plan a series of milestones in a latent space and to have an agent to
follow the milestones to accomplish a given task. The proposed method can learn
control-relevant, low-dimensional latent representations of milestones, which
makes it possible to efficiently perform long-term planning and vision-based
control. Furthermore, our approach exploits generation flexibility of the
diffusion model, which makes it possible to plan diverse trajectories for
multi-task decision-making. We demonstrate the proposed method across offline
reinforcement learning (RL) benchmarks and an visual manipulation environment.
The results show that our approach outperforms offline RL methods in solving
long-horizon, sparse-reward tasks and multi-task problems, while also achieving
the state-of-the-art performance on the most challenging vision-based
manipulation benchmark.
Related papers
- CUDC: A Curiosity-Driven Unsupervised Data Collection Method with
Adaptive Temporal Distances for Offline Reinforcement Learning [62.58375643251612]
We propose a Curiosity-driven Unsupervised Data Collection (CUDC) method to expand feature space using adaptive temporal distances for task-agnostic data collection.
With this adaptive reachability mechanism in place, the feature representation can be diversified, and the agent can navigate itself to collect higher-quality data with curiosity.
Empirically, CUDC surpasses existing unsupervised methods in efficiency and learning performance in various downstream offline RL tasks of the DeepMind control suite.
arXiv Detail & Related papers (2023-12-19T14:26:23Z) - Planning as In-Painting: A Diffusion-Based Embodied Task Planning
Framework for Environments under Uncertainty [56.30846158280031]
Task planning for embodied AI has been one of the most challenging problems.
We propose a task-agnostic method named 'planning as in-painting'
The proposed framework achieves promising performances in various embodied AI tasks.
arXiv Detail & Related papers (2023-12-02T10:07:17Z) - Efficient Planning with Latent Diffusion [18.678459478837976]
Temporal abstraction and efficient planning pose significant challenges in offline reinforcement learning.
Latent action spaces offer a more flexible paradigm, capturing only possible actions within the behavior policy support.
This paper presents a unified framework for continuous latent action space representation learning and planning by leveraging latent, score-based diffusion models.
arXiv Detail & Related papers (2023-09-30T08:50:49Z) - Compositional Foundation Models for Hierarchical Planning [52.18904315515153]
We propose a foundation model which leverages expert foundation model trained on language, vision and action data individually together to solve long-horizon tasks.
We use a large language model to construct symbolic plans that are grounded in the environment through a large video diffusion model.
Generated video plans are then grounded to visual-motor control, through an inverse dynamics model that infers actions from generated videos.
arXiv Detail & Related papers (2023-09-15T17:44:05Z) - Planning with Sequence Models through Iterative Energy Minimization [22.594413287842574]
We suggest an approach towards integrating planning with sequence models based on the idea of iterative energy minimization.
We train a masked language model to capture an implicit energy function over trajectories of actions, and formulate planning as finding a trajectory of actions with minimum energy.
We illustrate how this procedure enables improved performance over recent approaches across BabyAI and Atari environments.
arXiv Detail & Related papers (2023-03-28T17:53:22Z) - Sample-efficient Real-time Planning with Curiosity Cross-Entropy Method
and Contrastive Learning [21.995159117991278]
We propose Curiosity CEM, an improved version of the Cross-Entropy Method (CEM) algorithm for encouraging exploration via curiosity.
Our proposed method maximizes the sum of state-action Q values over the planning horizon, in which these Q values estimate the future extrinsic and intrinsic reward.
Experiments on image-based continuous control tasks from the DeepMind Control suite show that CCEM is by a large margin more sample-efficient than previous MBRL algorithms.
arXiv Detail & Related papers (2023-03-07T10:48:20Z) - Planning with Diffusion for Flexible Behavior Synthesis [125.24438991142573]
We consider what it would look like to fold as much of the trajectory optimization pipeline as possible into the modeling problem.
The core of our technical approach lies in a diffusion probabilistic model that plans by iteratively denoising trajectories.
arXiv Detail & Related papers (2022-05-20T07:02:03Z) - Evaluating model-based planning and planner amortization for continuous
control [79.49319308600228]
We take a hybrid approach, combining model predictive control (MPC) with a learned model and model-free policy learning.
We find that well-tuned model-free agents are strong baselines even for high DoF control problems.
We show that it is possible to distil a model-based planner into a policy that amortizes the planning without any loss of performance.
arXiv Detail & Related papers (2021-10-07T12:00:40Z) - Model-Based Reinforcement Learning via Latent-Space Collocation [110.04005442935828]
We argue that it is easier to solve long-horizon tasks by planning sequences of states rather than just actions.
We adapt the idea of collocation, which has shown good results on long-horizon tasks in optimal control literature, to the image-based setting by utilizing learned latent state space models.
arXiv Detail & Related papers (2021-06-24T17:59:18Z) - PlanGAN: Model-based Planning With Sparse Rewards and Multiple Goals [14.315501760755609]
PlanGAN is a model-based algorithm for solving multi-goal tasks in environments with sparse rewards.
Our studies indicate that PlanGAN can achieve comparable performance whilst being around 4-8 times more sample efficient.
arXiv Detail & Related papers (2020-06-01T12:53:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.