Socratic Planner: Inquiry-Based Zero-Shot Planning for Embodied Instruction Following
- URL: http://arxiv.org/abs/2404.15190v1
- Date: Sun, 21 Apr 2024 08:10:20 GMT
- Title: Socratic Planner: Inquiry-Based Zero-Shot Planning for Embodied Instruction Following
- Authors: Suyeon Shin, Sujin jeon, Junghyun Kim, Gi-Cheon Kang, Byoung-Tak Zhang,
- Abstract summary: Embodied Instruction Following (EIF) is the task of executing natural language instructions by navigating and interacting with objects in 3D environments.
One of the primary challenges in EIF is compositional task planning, which is often addressed with supervised or in-context learning with labeled data.
We introduce the Socratic Planner, the first zero-shot planning method that infers without the need for any training data.
- Score: 17.608330952846075
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Embodied Instruction Following (EIF) is the task of executing natural language instructions by navigating and interacting with objects in 3D environments. One of the primary challenges in EIF is compositional task planning, which is often addressed with supervised or in-context learning with labeled data. To this end, we introduce the Socratic Planner, the first zero-shot planning method that infers without the need for any training data. Socratic Planner first decomposes the instructions into substructural information of the task through self-questioning and answering, translating it into a high-level plan, i.e., a sequence of subgoals. Subgoals are executed sequentially, with our visually grounded re-planning mechanism adjusting plans dynamically through a dense visual feedback. We also introduce an evaluation metric of high-level plans, RelaxedHLP, for a more comprehensive evaluation. Experiments demonstrate the effectiveness of the Socratic Planner, achieving competitive performance on both zero-shot and few-shot task planning in the ALFRED benchmark, particularly excelling in tasks requiring higher-dimensional inference. Additionally, a precise adjustments in the plan were achieved by incorporating environmental visual information.
Related papers
- Hindsight Planner: A Closed-Loop Few-Shot Planner for Embodied Instruction Following [62.10809033451526]
This work focuses on building a task planner for Embodied Instruction Following (EIF) using Large Language Models (LLMs)
We frame the task as a Partially Observable Markov Decision Process (POMDP) and aim to develop a robust planner under a few-shot assumption.
Our experiments on the ALFRED dataset indicate that our planner achieves competitive performance under a few-shot assumption.
arXiv Detail & Related papers (2024-12-27T10:05:45Z) - Introduction to AI Planning [0.0]
The notes begin by introducing the state model and move on to exploring classical planning.
The most extensive section is dedicated to Hierarchical Task Network (HTN) planning.
The lecture notes end with a bonus chapter on the Planning Domain Definition (PDDL) Language.
arXiv Detail & Related papers (2024-12-16T10:38:04Z) - Closed-Loop Long-Horizon Robotic Planning via Equilibrium Sequence Modeling [23.62433580021779]
We advocate a self-refining scheme that iteratively refines a draft plan until an equilibrium is reached.
A nested equilibrium sequence modeling procedure is devised for efficient closed-loop planning.
Our method is evaluated on the VirtualHome-Env benchmark, showing advanced performance with better scaling for inference.
arXiv Detail & Related papers (2024-10-02T11:42:49Z) - Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos [48.15438373870542]
VidAssist is an integrated framework designed for zero/few-shot goal-oriented planning in instructional videos.
It employs a breadth-first search algorithm for optimal plan generation.
Experiments demonstrate that VidAssist offers a unified framework for different goal-oriented planning setups.
arXiv Detail & Related papers (2024-09-30T17:57:28Z) - Learning adaptive planning representations with natural language
guidance [90.24449752926866]
This paper describes Ada, a framework for automatically constructing task-specific planning representations.
Ada interactively learns a library of planner-compatible high-level action abstractions and low-level controllers adapted to a particular domain of planning tasks.
arXiv Detail & Related papers (2023-12-13T23:35:31Z) - Planning as In-Painting: A Diffusion-Based Embodied Task Planning
Framework for Environments under Uncertainty [56.30846158280031]
Task planning for embodied AI has been one of the most challenging problems.
We propose a task-agnostic method named 'planning as in-painting'
The proposed framework achieves promising performances in various embodied AI tasks.
arXiv Detail & Related papers (2023-12-02T10:07:17Z) - EIPE-text: Evaluation-Guided Iterative Plan Extraction for Long-Form
Narrative Text Generation [114.50719922069261]
We propose a new framework called Evaluation-guided Iterative Plan Extraction for long-form narrative text generation (EIPE-text)
EIPE-text has three stages: plan extraction, learning, and inference.
We evaluate the effectiveness of EIPE-text in the domains of novels and storytelling.
arXiv Detail & Related papers (2023-10-12T10:21:37Z) - Embodied Task Planning with Large Language Models [86.63533340293361]
We propose a TAsk Planing Agent (TaPA) in embodied tasks for grounded planning with physical scene constraint.
During inference, we discover the objects in the scene by extending open-vocabulary object detectors to multi-view RGB images collected in different achievable locations.
Experimental results show that the generated plan from our TaPA framework can achieve higher success rate than LLaVA and GPT-3.5 by a sizable margin.
arXiv Detail & Related papers (2023-07-04T17:58:25Z) - EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought [95.37585041654535]
Embodied AI is capable of planning and executing action sequences for robots to accomplish long-horizon tasks in physical environments.
In this work, we introduce EmbodiedGPT, an end-to-end multi-modal foundation model for embodied AI.
Experiments show the effectiveness of EmbodiedGPT on embodied tasks, including embodied planning, embodied control, visual captioning, and visual question answering.
arXiv Detail & Related papers (2023-05-24T11:04:30Z) - Learning to Reason over Scene Graphs: A Case Study of Finetuning GPT-2
into a Robot Language Model for Grounded Task Planning [45.51792981370957]
We investigate the applicability of a smaller class of large language models (LLMs) in robotic task planning by learning to decompose tasks into subgoal specifications for a planner to execute sequentially.
Our method grounds the input of the LLM on the domain that is represented as a scene graph, enabling it to translate human requests into executable robot plans.
Our findings suggest that the knowledge stored in an LLM can be effectively grounded to perform long-horizon task planning, demonstrating the promising potential for the future application of neuro-symbolic planning methods in robotics.
arXiv Detail & Related papers (2023-05-12T18:14:32Z) - Robust Hierarchical Planning with Policy Delegation [6.1678491628787455]
We propose a novel framework and algorithm for hierarchical planning based on the principle of delegation.
We show this planning approach is experimentally very competitive to classic planning and reinforcement learning techniques on a variety of domains.
arXiv Detail & Related papers (2020-10-25T04:36:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.