Planning as In-Painting: A Diffusion-Based Embodied Task Planning
Framework for Environments under Uncertainty
- URL: http://arxiv.org/abs/2312.01097v1
- Date: Sat, 2 Dec 2023 10:07:17 GMT
- Title: Planning as In-Painting: A Diffusion-Based Embodied Task Planning
Framework for Environments under Uncertainty
- Authors: Cheng-Fu Yang, Haoyang Xu, Te-Lin Wu, Xiaofeng Gao, Kai-Wei Chang,
Feng Gao
- Abstract summary: Task planning for embodied AI has been one of the most challenging problems.
We propose a task-agnostic method named 'planning as in-painting'
The proposed framework achieves promising performances in various embodied AI tasks.
- Score: 56.30846158280031
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Task planning for embodied AI has been one of the most challenging problems
where the community does not meet a consensus in terms of formulation. In this
paper, we aim to tackle this problem with a unified framework consisting of an
end-to-end trainable method and a planning algorithm. Particularly, we propose
a task-agnostic method named 'planning as in-painting'. In this method, we use
a Denoising Diffusion Model (DDM) for plan generation, conditioned on both
language instructions and perceptual inputs under partially observable
environments. Partial observation often leads to the model hallucinating the
planning. Therefore, our diffusion-based method jointly models both state
trajectory and goal estimation to improve the reliability of the generated
plan, given the limited available information at each step. To better leverage
newly discovered information along the plan execution for a higher success
rate, we propose an on-the-fly planning algorithm to collaborate with the
diffusion-based planner. The proposed framework achieves promising performances
in various embodied AI tasks, including vision-language navigation, object
manipulation, and task planning in a photorealistic virtual environment. The
code is available at: https://github.com/joeyy5588/planning-as-inpainting.
Related papers
- Hindsight Planner: A Closed-Loop Few-Shot Planner for Embodied Instruction Following [62.10809033451526]
This work focuses on building a task planner for Embodied Instruction Following (EIF) using Large Language Models (LLMs)
We frame the task as a Partially Observable Markov Decision Process (POMDP) and aim to develop a robust planner under a few-shot assumption.
Our experiments on the ALFRED dataset indicate that our planner achieves competitive performance under a few-shot assumption.
arXiv Detail & Related papers (2024-12-27T10:05:45Z) - GenPlan: Generative Sequence Models as Adaptive Planners [0.0]
Sequence models have demonstrated remarkable success in behavioral planning by leveraging previously collected demonstrations.
However, solving multi-task missions remains a significant challenge, particularly when the planner must adapt to unseen constraints and tasks.
We propose GenPlan: a discrete-flow model for adaptive planner, enabling sample-generative exploration and exploitation.
arXiv Detail & Related papers (2024-12-11T17:32:33Z) - PDDLEGO: Iterative Planning in Textual Environments [56.12148805913657]
Planning in textual environments has been shown to be a long-standing challenge even for current models.
We propose PDDLEGO that iteratively construct a planning representation that can lead to a partial plan for a given sub-goal.
We show that plans produced by few-shot PDDLEGO are 43% more efficient than generating plans end-to-end on the Coin Collector simulation.
arXiv Detail & Related papers (2024-05-30T08:01:20Z) - Path Planning based on 2D Object Bounding-box [8.082514573754954]
We present a path planning method that utilizes 2D bounding boxes of objects, developed through imitation learning in urban driving scenarios.
This is achieved by integrating high-definition (HD) map data with images captured by surrounding cameras.
We evaluate our model on the nuPlan planning task and observed that it performs competitively in comparison to existing vision-centric methods.
arXiv Detail & Related papers (2024-02-22T19:34:56Z) - PAS-SLAM: A Visual SLAM System for Planar Ambiguous Scenes [41.47703182059505]
We propose a visual SLAM system based on planar features designed for planar ambiguous scenes.
We present an integrated data association strategy that combines plane parameters, semantic information, projection IoU, and non-parametric tests.
Finally, we design a set of multi-constraint factor graphs for camera pose optimization.
arXiv Detail & Related papers (2024-02-09T01:34:26Z) - Unified Task and Motion Planning using Object-centric Abstractions of
Motion Constraints [56.283944756315066]
We propose an alternative TAMP approach that unifies task and motion planning into a single search.
Our approach is based on an object-centric abstraction of motion constraints that permits leveraging the computational efficiency of off-the-shelf AI search to yield physically feasible plans.
arXiv Detail & Related papers (2023-12-29T14:00:20Z) - Compositional Foundation Models for Hierarchical Planning [52.18904315515153]
We propose a foundation model which leverages expert foundation model trained on language, vision and action data individually together to solve long-horizon tasks.
We use a large language model to construct symbolic plans that are grounded in the environment through a large video diffusion model.
Generated video plans are then grounded to visual-motor control, through an inverse dynamics model that infers actions from generated videos.
arXiv Detail & Related papers (2023-09-15T17:44:05Z) - PDPP: Projected Diffusion for Procedure Planning in Instructional Videos [18.984980596601513]
We study the problem of procedure planning in instructional videos, which aims to make a plan (i.e. a sequence of actions) given the current visual observation and the desired goal.
Previous works cast this as a sequence modeling problem and leverage either intermediate visual observations or language instructions as supervision.
To avoid intermediate supervision annotation and error accumulation caused by planning autoregressively, we propose a diffusion-based framework.
arXiv Detail & Related papers (2023-03-26T10:50:16Z) - Gradient-Based Mixed Planning with Discrete and Continuous Actions [34.885999774739055]
We propose a quadratic-based framework to simultaneously optimize continuous parameters and actions of candidate plans.
The framework is combined with a module to estimate the best plan candidate to transit initial state to the goal based on relaxation.
arXiv Detail & Related papers (2021-10-19T14:21:19Z) - Divide-and-Conquer Monte Carlo Tree Search For Goal-Directed Planning [78.65083326918351]
We consider alternatives to an implicit sequential planning assumption.
We propose Divide-and-Conquer Monte Carlo Tree Search (DC-MCTS) for approximating the optimal plan.
We show that this algorithmic flexibility over planning order leads to improved results in navigation tasks in grid-worlds.
arXiv Detail & Related papers (2020-04-23T18:08:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.