Multimodal Contextualized Plan Prediction for Embodied Task Completion
- URL: http://arxiv.org/abs/2305.06485v1
- Date: Wed, 10 May 2023 22:29:12 GMT
- Title: Multimodal Contextualized Plan Prediction for Embodied Task Completion
- Authors: Mert \.Inan, Aishwarya Padmakumar, Spandana Gella, Patrick Lange,
Dilek Hakkani-Tur
- Abstract summary: Task planning is an important component of traditional robotics systems enabling robots to compose fine grained skills to perform more complex tasks.
Recent work building systems for translating natural language to executable actions for task completion in simulated embodied agents is focused on directly predicting low level action sequences.
We focus on predicting a higher level plan representation for one such embodied task completion dataset - TEACh.
- Score: 9.659463406886301
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Task planning is an important component of traditional robotics systems
enabling robots to compose fine grained skills to perform more complex tasks.
Recent work building systems for translating natural language to executable
actions for task completion in simulated embodied agents is focused on directly
predicting low level action sequences that would be expected to be directly
executable by a physical robot. In this work, we instead focus on predicting a
higher level plan representation for one such embodied task completion dataset
- TEACh, under the assumption that techniques for high-level plan prediction
from natural language are expected to be more transferable to physical robot
systems. We demonstrate that better plans can be predicted using multimodal
context, and that plan prediction and plan execution modules are likely
dependent on each other and hence it may not be ideal to fully decouple them.
Further, we benchmark execution of oracle plans to quantify the scope for
improvement in plan prediction models.
Related papers
- Joint Verification and Refinement of Language Models for Safety-Constrained Planning [21.95203475140736]
We develop a method to generate executable plans and formally verify them against task-relevant safety specifications.
Given a high-level task description in natural language, the proposed method queries a language model to generate plans in the form of executable robot programs.
It then converts the generated plan into an automaton-based representation, allowing formal verification of the automaton against the specifications.
arXiv Detail & Related papers (2024-10-18T21:16:30Z) - Closed-Loop Long-Horizon Robotic Planning via Equilibrium Sequence Modeling [23.62433580021779]
We advocate a self-refining scheme that iteratively refines a draft plan until an equilibrium is reached.
A nested equilibrium sequence modeling procedure is devised for efficient closed-loop planning.
Our method is evaluated on the VirtualHome-Env benchmark, showing advanced performance with better scaling for inference.
arXiv Detail & Related papers (2024-10-02T11:42:49Z) - Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos [48.15438373870542]
VidAssist is an integrated framework designed for zero/few-shot goal-oriented planning in instructional videos.
It employs a breadth-first search algorithm for optimal plan generation.
Experiments demonstrate that VidAssist offers a unified framework for different goal-oriented planning setups.
arXiv Detail & Related papers (2024-09-30T17:57:28Z) - Ask-before-Plan: Proactive Language Agents for Real-World Planning [68.08024918064503]
Proactive Agent Planning requires language agents to predict clarification needs based on user-agent conversation and agent-environment interaction.
We propose a novel multi-agent framework, Clarification-Execution-Planning (textttCEP), which consists of three agents specialized in clarification, execution, and planning.
arXiv Detail & Related papers (2024-06-18T14:07:28Z) - Safe Task Planning for Language-Instructed Multi-Robot Systems using Conformal Prediction [11.614036749291216]
We introduce a new distributed multi-robot planner, S-ATLAS for Safe plAnning for Teams of Language-instructed AgentS, that is capable of achieving user-defined mission success rates.
We show, both theoretically and empirically, that the proposed planner can achieve user-specified task success rates while minimizing the overall number of help requests.
arXiv Detail & Related papers (2024-02-23T15:02:44Z) - Consolidating Trees of Robotic Plans Generated Using Large Language
Models to Improve Reliability [6.4111574364474215]
The inherent probabilistic nature of Large Language Models (LLMs) introduces an element of unpredictability.
This paper introduces an innovative approach aims to generate correct and optimal robotic task plans for diverse real-world demands and scenarios.
arXiv Detail & Related papers (2024-01-15T18:01:59Z) - Embodied Task Planning with Large Language Models [86.63533340293361]
We propose a TAsk Planing Agent (TaPA) in embodied tasks for grounded planning with physical scene constraint.
During inference, we discover the objects in the scene by extending open-vocabulary object detectors to multi-view RGB images collected in different achievable locations.
Experimental results show that the generated plan from our TaPA framework can achieve higher success rate than LLaVA and GPT-3.5 by a sizable margin.
arXiv Detail & Related papers (2023-07-04T17:58:25Z) - EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought [95.37585041654535]
Embodied AI is capable of planning and executing action sequences for robots to accomplish long-horizon tasks in physical environments.
In this work, we introduce EmbodiedGPT, an end-to-end multi-modal foundation model for embodied AI.
Experiments show the effectiveness of EmbodiedGPT on embodied tasks, including embodied planning, embodied control, visual captioning, and visual question answering.
arXiv Detail & Related papers (2023-05-24T11:04:30Z) - A Framework for Neurosymbolic Robot Action Planning using Large Language Models [3.0501524254444767]
We present a framework aimed at bridging the gap between symbolic task planning and machine learning approaches.
The rationale is training Large Language Models (LLMs) into a neurosymbolic task planner compatible with the Planning Domain Definition Language (PDDL)
Preliminary results in selected domains show that our method can: (i) solve 95.5% of problems in a test data set of 1,000 samples; (ii) produce plans up to 13.5% shorter than a traditional symbolic planner; (iii) reduce average overall waiting times for a plan availability by up to 61.4%.
arXiv Detail & Related papers (2023-03-01T11:54:22Z) - ProgPrompt: Generating Situated Robot Task Plans using Large Language
Models [68.57918965060787]
Large language models (LLMs) can be used to score potential next actions during task planning.
We present a programmatic LLM prompt structure that enables plan generation functional across situated environments.
arXiv Detail & Related papers (2022-09-22T20:29:49Z) - Long-Horizon Visual Planning with Goal-Conditioned Hierarchical
Predictors [124.30562402952319]
The ability to predict and plan into the future is fundamental for agents acting in the world.
Current learning approaches for visual prediction and planning fail on long-horizon tasks.
We propose a framework for visual prediction and planning that is able to overcome both of these limitations.
arXiv Detail & Related papers (2020-06-23T17:58:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.