ThinkBot: Embodied Instruction Following with Thought Chain Reasoning
- URL: http://arxiv.org/abs/2312.07062v2
- Date: Thu, 14 Dec 2023 03:28:49 GMT
- Title: ThinkBot: Embodied Instruction Following with Thought Chain Reasoning
- Authors: Guanxing Lu, Ziwei Wang, Changliu Liu, Jiwen Lu, Yansong Tang
- Abstract summary: Embodied Instruction Following (EIF) requires agents to complete human instruction by interacting objects in complicated surrounding environments.
We propose ThinkBot that reasons the thought chain in human instruction to recover the missing action descriptions.
Our ThinkBot outperforms the state-of-the-art EIF methods by a sizable margin in both success rate and execution efficiency.
- Score: 66.09880459084901
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Embodied Instruction Following (EIF) requires agents to complete human
instruction by interacting objects in complicated surrounding environments.
Conventional methods directly consider the sparse human instruction to generate
action plans for agents, which usually fail to achieve human goals because of
the instruction incoherence in action descriptions. On the contrary, we propose
ThinkBot that reasons the thought chain in human instruction to recover the
missing action descriptions, so that the agent can successfully complete human
goals by following the coherent instruction. Specifically, we first design an
instruction completer based on large language models to recover the missing
actions with interacted objects between consecutive human instruction, where
the perceived surrounding environments and the completed sub-goals are
considered for instruction completion. Based on the partially observed scene
semantic maps, we present an object localizer to infer the position of
interacted objects for agents to achieve complex human goals. Extensive
experiments in the simulated environment show that our ThinkBot outperforms the
state-of-the-art EIF methods by a sizable margin in both success rate and
execution efficiency.
Related papers
- Infer Human's Intentions Before Following Natural Language Instructions [24.197496779892383]
We propose a new framework, Follow Instructions with Social and Embodied Reasoning (FISER), aiming for better natural language instruction following in collaborative tasks.
Our framework makes explicit inferences about human goals and intentions as intermediate reasoning steps.
We empirically demonstrate that using social reasoning to explicitly infer human intentions before making action plans surpasses purely end-to-end approaches.
arXiv Detail & Related papers (2024-09-26T17:19:49Z) - Embodied Instruction Following in Unknown Environments [66.60163202450954]
We propose an embodied instruction following (EIF) method for complex tasks in the unknown environment.
We build a hierarchical embodied instruction following framework including the high-level task planner and the low-level exploration controller.
For the task planner, we generate the feasible step-by-step plans for human goal accomplishment according to the task completion process and the known visual clues.
arXiv Detail & Related papers (2024-06-17T17:55:40Z) - Self-Explainable Affordance Learning with Embodied Caption [63.88435741872204]
We introduce Self-Explainable Affordance learning (SEA) with embodied caption.
SEA enables robots to articulate their intentions and bridge the gap between explainable vision-language caption and visual affordance learning.
We propose a novel model to effectively combine affordance grounding with self-explanation in a simple but efficient manner.
arXiv Detail & Related papers (2024-04-08T15:22:38Z) - Real-time Addressee Estimation: Deployment of a Deep-Learning Model on
the iCub Robot [52.277579221741746]
Addressee Estimation is a skill essential for social robots to interact smoothly with humans.
Inspired by human perceptual skills, a deep-learning model for Addressee Estimation is designed, trained, and deployed on an iCub robot.
The study presents the procedure of such implementation and the performance of the model deployed in real-time human-robot interaction.
arXiv Detail & Related papers (2023-11-09T13:01:21Z) - Proactive Human-Robot Interaction using Visuo-Lingual Transformers [0.0]
Humans possess the innate ability to extract latent visuo-lingual cues to infer context through human interaction.
We propose a learning-based method that uses visual cues from the scene, lingual commands from a user and knowledge of prior object-object interaction to identify and proactively predict the underlying goal the user intends to achieve.
arXiv Detail & Related papers (2023-10-04T00:50:21Z) - Learning Video-Conditioned Policies for Unseen Manipulation Tasks [83.2240629060453]
Video-conditioned Policy learning maps human demonstrations of previously unseen tasks to robot manipulation skills.
We learn our policy to generate appropriate actions given current scene observations and a video of the target task.
We validate our approach on a set of challenging multi-task robot manipulation environments and outperform state of the art.
arXiv Detail & Related papers (2023-05-10T16:25:42Z) - SEAL: Semantic Frame Execution And Localization for Perceiving Afforded
Robot Actions [5.522839151632667]
We extend the semantic frame representation for robot manipulation actions and introduce the problem of Semantic Frame Execution And Localization for Perceiving Afforded Robot Actions (SEAL) as a graphical model.
For the SEAL problem, we describe our nonparametric Semantic Frame Mapping (SeFM) algorithm for maintaining belief over a finite set of semantic frames as the locations of actions afforded to the robot.
arXiv Detail & Related papers (2023-03-24T15:25:41Z) - A Persistent Spatial Semantic Representation for High-level Natural
Language Instruction Execution [54.385344986265714]
We propose a persistent spatial semantic representation method to bridge the gap between language and robot actions.
We evaluate our approach on the ALFRED benchmark and achieve state-of-the-art results, despite completely avoiding the commonly used step-by-step instructions.
arXiv Detail & Related papers (2021-07-12T17:47:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.