Planning as In-Painting: A Diffusion-Based Embodied Task Planning
Framework for Environments under Uncertainty
- URL: http://arxiv.org/abs/2312.01097v1
- Date: Sat, 2 Dec 2023 10:07:17 GMT
- Title: Planning as In-Painting: A Diffusion-Based Embodied Task Planning
Framework for Environments under Uncertainty
- Authors: Cheng-Fu Yang, Haoyang Xu, Te-Lin Wu, Xiaofeng Gao, Kai-Wei Chang,
Feng Gao
- Abstract summary: Task planning for embodied AI has been one of the most challenging problems.
We propose a task-agnostic method named 'planning as in-painting'
The proposed framework achieves promising performances in various embodied AI tasks.
- Score: 56.30846158280031
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Task planning for embodied AI has been one of the most challenging problems
where the community does not meet a consensus in terms of formulation. In this
paper, we aim to tackle this problem with a unified framework consisting of an
end-to-end trainable method and a planning algorithm. Particularly, we propose
a task-agnostic method named 'planning as in-painting'. In this method, we use
a Denoising Diffusion Model (DDM) for plan generation, conditioned on both
language instructions and perceptual inputs under partially observable
environments. Partial observation often leads to the model hallucinating the
planning. Therefore, our diffusion-based method jointly models both state
trajectory and goal estimation to improve the reliability of the generated
plan, given the limited available information at each step. To better leverage
newly discovered information along the plan execution for a higher success
rate, we propose an on-the-fly planning algorithm to collaborate with the
diffusion-based planner. The proposed framework achieves promising performances
in various embodied AI tasks, including vision-language navigation, object
manipulation, and task planning in a photorealistic virtual environment. The
code is available at: https://github.com/joeyy5588/planning-as-inpainting.
Related papers
- PDDLEGO: Iterative Planning in Textual Environments [56.12148805913657]
Planning in textual environments has been shown to be a long-standing challenge even for current models.
We propose PDDLEGO that iteratively construct a planning representation that can lead to a partial plan for a given sub-goal.
We show that plans produced by few-shot PDDLEGO are 43% more efficient than generating plans end-to-end on the Coin Collector simulation.
arXiv Detail & Related papers (2024-05-30T08:01:20Z) - Behaviour Planning: A Toolkit for Diverse Planning [1.2213833413853037]
We introduce emphBehaviour Planning, a diverse planning toolkit that can generate diverse plans based on modular diversity models.
We present a qualitative framework for describing diversity models, a planning approach for generating plans aligned with any given diversity model, and a practical implementation of an SMT-based behaviour planner.
arXiv Detail & Related papers (2024-05-07T13:18:22Z) - Path Planning based on 2D Object Bounding-box [8.082514573754954]
We present a path planning method that utilizes 2D bounding boxes of objects, developed through imitation learning in urban driving scenarios.
This is achieved by integrating high-definition (HD) map data with images captured by surrounding cameras.
We evaluate our model on the nuPlan planning task and observed that it performs competitively in comparison to existing vision-centric methods.
arXiv Detail & Related papers (2024-02-22T19:34:56Z) - PAS-SLAM: A Visual SLAM System for Planar Ambiguous Scenes [41.47703182059505]
We propose a visual SLAM system based on planar features designed for planar ambiguous scenes.
We present an integrated data association strategy that combines plane parameters, semantic information, projection IoU, and non-parametric tests.
Finally, we design a set of multi-constraint factor graphs for camera pose optimization.
arXiv Detail & Related papers (2024-02-09T01:34:26Z) - Unified Task and Motion Planning using Object-centric Abstractions of
Motion Constraints [56.283944756315066]
We propose an alternative TAMP approach that unifies task and motion planning into a single search.
Our approach is based on an object-centric abstraction of motion constraints that permits leveraging the computational efficiency of off-the-shelf AI search to yield physically feasible plans.
arXiv Detail & Related papers (2023-12-29T14:00:20Z) - Compositional Foundation Models for Hierarchical Planning [52.18904315515153]
We propose a foundation model which leverages expert foundation model trained on language, vision and action data individually together to solve long-horizon tasks.
We use a large language model to construct symbolic plans that are grounded in the environment through a large video diffusion model.
Generated video plans are then grounded to visual-motor control, through an inverse dynamics model that infers actions from generated videos.
arXiv Detail & Related papers (2023-09-15T17:44:05Z) - Position Paper: Online Modeling for Offline Planning [2.8326418377665346]
A key part of AI planning research is the representation of action models.
Despite the maturity of the field, AI planning technology is still rarely used outside the research community.
We argue that this is because the modeling process is assumed to have taken place and completed prior to the planning process.
arXiv Detail & Related papers (2022-06-07T14:48:08Z) - Gradient-Based Mixed Planning with Discrete and Continuous Actions [34.885999774739055]
We propose a quadratic-based framework to simultaneously optimize continuous parameters and actions of candidate plans.
The framework is combined with a module to estimate the best plan candidate to transit initial state to the goal based on relaxation.
arXiv Detail & Related papers (2021-10-19T14:21:19Z) - Divide-and-Conquer Monte Carlo Tree Search For Goal-Directed Planning [78.65083326918351]
We consider alternatives to an implicit sequential planning assumption.
We propose Divide-and-Conquer Monte Carlo Tree Search (DC-MCTS) for approximating the optimal plan.
We show that this algorithmic flexibility over planning order leads to improved results in navigation tasks in grid-worlds.
arXiv Detail & Related papers (2020-04-23T18:08:58Z) - Hallucinative Topological Memory for Zero-Shot Visual Planning [86.20780756832502]
In visual planning (VP), an agent learns to plan goal-directed behavior from observations of a dynamical system obtained offline.
Most previous works on VP approached the problem by planning in a learned latent space, resulting in low-quality visual plans.
Here, we propose a simple VP method that plans directly in image space and displays competitive performance.
arXiv Detail & Related papers (2020-02-27T18:54:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.