Infer Human's Intentions Before Following Natural Language Instructions
- URL: http://arxiv.org/abs/2409.18073v1
- Date: Thu, 26 Sep 2024 17:19:49 GMT
- Title: Infer Human's Intentions Before Following Natural Language Instructions
- Authors: Yanming Wan, Yue Wu, Yiping Wang, Jiayuan Mao, Natasha Jaques
- Abstract summary: We propose a new framework, Follow Instructions with Social and Embodied Reasoning (FISER), aiming for better natural language instruction following in collaborative tasks.
Our framework makes explicit inferences about human goals and intentions as intermediate reasoning steps.
We empirically demonstrate that using social reasoning to explicitly infer human intentions before making action plans surpasses purely end-to-end approaches.
- Score: 24.197496779892383
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: For AI agents to be helpful to humans, they should be able to follow natural
language instructions to complete everyday cooperative tasks in human
environments. However, real human instructions inherently possess ambiguity,
because the human speakers assume sufficient prior knowledge about their hidden
goals and intentions. Standard language grounding and planning methods fail to
address such ambiguities because they do not model human internal goals as
additional partially observable factors in the environment. We propose a new
framework, Follow Instructions with Social and Embodied Reasoning (FISER),
aiming for better natural language instruction following in collaborative
embodied tasks. Our framework makes explicit inferences about human goals and
intentions as intermediate reasoning steps. We implement a set of
Transformer-based models and evaluate them over a challenging benchmark,
HandMeThat. We empirically demonstrate that using social reasoning to
explicitly infer human intentions before making action plans surpasses purely
end-to-end approaches. We also compare our implementation with strong
baselines, including Chain of Thought prompting on the largest available
pre-trained language models, and find that FISER provides better performance on
the embodied social reasoning tasks under investigation, reaching the
state-of-the-art on HandMeThat.
Related papers
- SIFToM: Robust Spoken Instruction Following through Theory of Mind [51.326266354164716]
We present a cognitively inspired model, Speech Instruction Following through Theory of Mind (SIFToM), to enable robots to pragmatically follow human instructions under diverse speech conditions.
Results show that the SIFToM model outperforms state-of-the-art speech and language models, approaching human-level accuracy on challenging speech instruction following tasks.
arXiv Detail & Related papers (2024-09-17T02:36:10Z) - Situated Instruction Following [87.37244711380411]
We propose situated instruction following, which embraces the inherent underspecification and ambiguity of real-world communication.
The meaning of situated instructions naturally unfold through the past actions and the expected future behaviors of the human involved.
Our experiments indicate that state-of-the-art Embodied Instruction Following (EIF) models lack holistic understanding of situated human intention.
arXiv Detail & Related papers (2024-07-15T19:32:30Z) - Comparing Apples to Oranges: LLM-powered Multimodal Intention Prediction in an Object Categorization Task [17.190635800969456]
In this paper, we examine using Large Language Models to infer human intention in a collaborative object categorization task with a physical robot.
We propose a novel multimodal approach that integrates user non-verbal cues, like hand gestures, body poses, and facial expressions, with environment states and user verbal cues to predict user intentions.
arXiv Detail & Related papers (2024-04-12T12:15:14Z) - ThinkBot: Embodied Instruction Following with Thought Chain Reasoning [66.09880459084901]
Embodied Instruction Following (EIF) requires agents to complete human instruction by interacting objects in complicated surrounding environments.
We propose ThinkBot that reasons the thought chain in human instruction to recover the missing action descriptions.
Our ThinkBot outperforms the state-of-the-art EIF methods by a sizable margin in both success rate and execution efficiency.
arXiv Detail & Related papers (2023-12-12T08:30:09Z) - HandMeThat: Human-Robot Communication in Physical and Social
Environments [73.91355172754717]
HandMeThat is a benchmark for a holistic evaluation of instruction understanding and following in physical and social environments.
HandMeThat contains 10,000 episodes of human-robot interactions.
We show that both offline and online reinforcement learning algorithms perform poorly on HandMeThat.
arXiv Detail & Related papers (2023-10-05T16:14:46Z) - The Neuro-Symbolic Inverse Planning Engine (NIPE): Modeling
Probabilistic Social Inferences from Linguistic Inputs [50.32802502923367]
We study the process of language driving and influencing social reasoning in a probabilistic goal inference domain.
We propose a neuro-symbolic model that carries out goal inference from linguistic inputs of agent scenarios.
Our model closely matches human response patterns and better predicts human judgements than using an LLM alone.
arXiv Detail & Related papers (2023-06-25T19:38:01Z) - "No, to the Right" -- Online Language Corrections for Robotic
Manipulation via Shared Autonomy [70.45420918526926]
We present LILAC, a framework for incorporating and adapting to natural language corrections online during execution.
Instead of discrete turn-taking between a human and robot, LILAC splits agency between the human and robot.
We show that our corrections-aware approach obtains higher task completion rates, and is subjectively preferred by users.
arXiv Detail & Related papers (2023-01-06T15:03:27Z) - GoalNet: Inferring Conjunctive Goal Predicates from Human Plan
Demonstrations for Robot Instruction Following [15.405156791794191]
Our goal is to enable a robot to learn how to sequence its actions to perform tasks specified as natural language instructions.
We introduce a novel neuro-symbolic model, GoalNet, for contextual and task dependent inference of goal predicates.
GoalNet demonstrates a significant improvement (51%) in the task completion rate in comparison to a state-of-the-art rule-based approach.
arXiv Detail & Related papers (2022-05-14T15:14:40Z) - Inverse Reinforcement Learning with Natural Language Goals [8.972202854038382]
We propose a novel inverse reinforcement learning algorithm to learn a language-conditioned policy and reward function.
Our algorithm outperforms multiple baselines by a large margin on a vision-based natural language instruction following dataset.
arXiv Detail & Related papers (2020-08-16T14:43:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.