Related papers: Planning as In-Painting: A Diffusion-Based Embodied Task Planning Framework for Environments under Uncertainty

Planning as In-Painting: A Diffusion-Based Embodied Task Planning Framework for Environments under Uncertainty

URL: http://arxiv.org/abs/2312.01097v1
Date: Sat, 2 Dec 2023 10:07:17 GMT
Title: Planning as In-Painting: A Diffusion-Based Embodied Task Planning Framework for Environments under Uncertainty
Authors: Cheng-Fu Yang, Haoyang Xu, Te-Lin Wu, Xiaofeng Gao, Kai-Wei Chang, Feng Gao
Abstract summary: Task planning for embodied AI has been one of the most challenging problems. We propose a task-agnostic method named 'planning as in-painting' The proposed framework achieves promising performances in various embodied AI tasks.
Score: 56.30846158280031
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Task planning for embodied AI has been one of the most challenging problems where the community does not meet a consensus in terms of formulation. In this paper, we aim to tackle this problem with a unified framework consisting of an end-to-end trainable method and a planning algorithm. Particularly, we propose a task-agnostic method named 'planning as in-painting'. In this method, we use a Denoising Diffusion Model (DDM) for plan generation, conditioned on both language instructions and perceptual inputs under partially observable environments. Partial observation often leads to the model hallucinating the planning. Therefore, our diffusion-based method jointly models both state trajectory and goal estimation to improve the reliability of the generated plan, given the limited available information at each step. To better leverage newly discovered information along the plan execution for a higher success rate, we propose an on-the-fly planning algorithm to collaborate with the diffusion-based planner. The proposed framework achieves promising performances in various embodied AI tasks, including vision-language navigation, object manipulation, and task planning in a photorealistic virtual environment. The code is available at: https://github.com/joeyy5588/planning-as-inpainting.

Related papers

Hindsight Planner: A Closed-Loop Few-Shot Planner for Embodied Instruction Following [62.10809033451526]
This work focuses on building a task planner for Embodied Instruction Following (EIF) using Large Language Models (LLMs) We frame the task as a Partially Observable Markov Decision Process (POMDP) and aim to develop a robust planner under a few-shot assumption. Our experiments on the ALFRED dataset indicate that our planner achieves competitive performance under a few-shot assumption.
arXiv Detail & Related papers (2024-12-27T10:05:45Z)
GenPlan: Generative Sequence Models as Adaptive Planners [0.0]
Sequence models have demonstrated remarkable success in behavioral planning by leveraging previously collected demonstrations. However, solving multi-task missions remains a significant challenge, particularly when the planner must adapt to unseen constraints and tasks. We propose GenPlan: a discrete-flow model for adaptive planner, enabling sample-generative exploration and exploitation.
arXiv Detail & Related papers (2024-12-11T17:32:33Z)
PDDLEGO: Iterative Planning in Textual Environments [56.12148805913657]
Planning in textual environments has been shown to be a long-standing challenge even for current models. We propose PDDLEGO that iteratively construct a planning representation that can lead to a partial plan for a given sub-goal. We show that plans produced by few-shot PDDLEGO are 43% more efficient than generating plans end-to-end on the Coin Collector simulation.
arXiv Detail & Related papers (2024-05-30T08:01:20Z)
Behaviour Planning: A Toolkit for Diverse Planning [1.2213833413853037]
We introduce emphBehaviour Planning, a diverse planning toolkit that can generate diverse plans based on modular diversity models. We present a qualitative framework for describing diversity models, a planning approach for generating plans aligned with any given diversity model, and a practical implementation of an SMT-based behaviour planner.
arXiv Detail & Related papers (2024-05-07T13:18:22Z)
Path Planning based on 2D Object Bounding-box [8.082514573754954]
We present a path planning method that utilizes 2D bounding boxes of objects, developed through imitation learning in urban driving scenarios. This is achieved by integrating high-definition (HD) map data with images captured by surrounding cameras. We evaluate our model on the nuPlan planning task and observed that it performs competitively in comparison to existing vision-centric methods.
arXiv Detail & Related papers (2024-02-22T19:34:56Z)
PAS-SLAM: A Visual SLAM System for Planar Ambiguous Scenes [41.47703182059505]
We propose a visual SLAM system based on planar features designed for planar ambiguous scenes. We present an integrated data association strategy that combines plane parameters, semantic information, projection IoU, and non-parametric tests. Finally, we design a set of multi-constraint factor graphs for camera pose optimization.
arXiv Detail & Related papers (2024-02-09T01:34:26Z)
Unified Task and Motion Planning using Object-centric Abstractions of Motion Constraints [56.283944756315066]
We propose an alternative TAMP approach that unifies task and motion planning into a single search. Our approach is based on an object-centric abstraction of motion constraints that permits leveraging the computational efficiency of off-the-shelf AI search to yield physically feasible plans.
arXiv Detail & Related papers (2023-12-29T14:00:20Z)
Compositional Foundation Models for Hierarchical Planning [52.18904315515153]
We propose a foundation model which leverages expert foundation model trained on language, vision and action data individually together to solve long-horizon tasks. We use a large language model to construct symbolic plans that are grounded in the environment through a large video diffusion model. Generated video plans are then grounded to visual-motor control, through an inverse dynamics model that infers actions from generated videos.
arXiv Detail & Related papers (2023-09-15T17:44:05Z)
PDPP: Projected Diffusion for Procedure Planning in Instructional Videos [18.984980596601513]
We study the problem of procedure planning in instructional videos, which aims to make a plan (i.e. a sequence of actions) given the current visual observation and the desired goal. Previous works cast this as a sequence modeling problem and leverage either intermediate visual observations or language instructions as supervision. To avoid intermediate supervision annotation and error accumulation caused by planning autoregressively, we propose a diffusion-based framework.
arXiv Detail & Related papers (2023-03-26T10:50:16Z)
Position Paper: Online Modeling for Offline Planning [2.8326418377665346]
A key part of AI planning research is the representation of action models. Despite the maturity of the field, AI planning technology is still rarely used outside the research community. We argue that this is because the modeling process is assumed to have taken place and completed prior to the planning process.
arXiv Detail & Related papers (2022-06-07T14:48:08Z)
Gradient-Based Mixed Planning with Discrete and Continuous Actions [34.885999774739055]
We propose a quadratic-based framework to simultaneously optimize continuous parameters and actions of candidate plans. The framework is combined with a module to estimate the best plan candidate to transit initial state to the goal based on relaxation.
arXiv Detail & Related papers (2021-10-19T14:21:19Z)
Divide-and-Conquer Monte Carlo Tree Search For Goal-Directed Planning [78.65083326918351]
We consider alternatives to an implicit sequential planning assumption. We propose Divide-and-Conquer Monte Carlo Tree Search (DC-MCTS) for approximating the optimal plan. We show that this algorithmic flexibility over planning order leads to improved results in navigation tasks in grid-worlds.
arXiv Detail & Related papers (2020-04-23T18:08:58Z)
Hallucinative Topological Memory for Zero-Shot Visual Planning [86.20780756832502]
In visual planning (VP), an agent learns to plan goal-directed behavior from observations of a dynamical system obtained offline. Most previous works on VP approached the problem by planning in a learned latent space, resulting in low-quality visual plans. Here, we propose a simple VP method that plans directly in image space and displays competitive performance.
arXiv Detail & Related papers (2020-02-27T18:54:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.