RMM: A Recursive Mental Model for Dialog Navigation
- URL: http://arxiv.org/abs/2005.00728v2
- Date: Tue, 6 Oct 2020 02:16:27 GMT
- Title: RMM: A Recursive Mental Model for Dialog Navigation
- Authors: Homero Roman Roman, Yonatan Bisk, Jesse Thomason, Asli Celikyilmaz,
Jianfeng Gao
- Abstract summary: Language-guided robots must be able to both ask humans questions and understand answers.
Inspired by theory of mind, we propose the Recursive Mental Model (RMM)
We demonstrate that RMM enables better generalization to novel environments.
- Score: 102.42641990401735
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Language-guided robots must be able to both ask humans questions and
understand answers. Much existing work focuses only on the latter. In this
paper, we go beyond instruction following and introduce a two-agent task where
one agent navigates and asks questions that a second, guiding agent answers.
Inspired by theory of mind, we propose the Recursive Mental Model (RMM). The
navigating agent models the guiding agent to simulate answers given candidate
generated questions. The guiding agent in turn models the navigating agent to
simulate navigation steps it would take to generate answers. We use the
progress agents make towards the goal as a reinforcement learning reward signal
to directly inform not only navigation actions, but also both question and
answer generation. We demonstrate that RMM enables better generalization to
novel environments. Interlocutor modelling may be a way forward for human-agent
dialogue where robots need to both ask and answer questions.
Related papers
- Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning [50.47568731994238]
Key method for creating Artificial Intelligence (AI) agents is Reinforcement Learning (RL)
This paper presents a general framework model for integrating and learning structured reasoning into AI agents' policies.
arXiv Detail & Related papers (2023-12-22T17:57:57Z) - NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration [57.15811390835294]
This paper describes how we can train a single unified diffusion policy to handle both goal-directed navigation and goal-agnostic exploration.
We show that this unified policy results in better overall performance when navigating to visually indicated goals in novel environments.
Our experiments, conducted on a real-world mobile robot platform, show effective navigation in unseen environments in comparison with five alternative methods.
arXiv Detail & Related papers (2023-10-11T21:07:14Z) - R2H: Building Multimodal Navigation Helpers that Respond to Help
Requests [30.695642371684663]
We first introduce a novel benchmark, Respond to Help Requests (R2H), to promote the development of multi-modal navigation helpers.
R2H mainly includes two tasks: (1) Respond to Dialog History (RDH), which assesses the helper agent's ability to generate informative responses based on a given dialog history, and (2) Respond during Interaction (RdI), which evaluates the effectiveness and efficiency of the response during consistent cooperation with a task performer.
arXiv Detail & Related papers (2023-05-23T17:12:09Z) - Lana: A Language-Capable Navigator for Instruction Following and
Generation [70.76686546473994]
LANA is a language-capable navigation agent which is able to execute human-written navigation commands and provide route descriptions to humans.
We empirically verify that, compared with recent advanced task-specific solutions, LANA attains better performances on both instruction following and route description.
In addition, endowed with language generation capability, LANA can explain to humans its behaviors and assist human's wayfinding.
arXiv Detail & Related papers (2023-03-15T07:21:28Z) - VLN-Trans: Translator for the Vision and Language Navigation Agent [23.84492755669486]
We design a translator module for the navigation agent to convert the original instructions into easy-to-follow sub-instruction representations.
We create a new synthetic sub-instruction dataset and design specific tasks to train the translator and the navigation agent.
We evaluate our approach on Room2Room(R2R), Room4room(R4R), and Room2Room Last (R2R-Last) datasets.
arXiv Detail & Related papers (2023-02-18T04:19:51Z) - INSCIT: Information-Seeking Conversations with Mixed-Initiative
Interactions [47.90088587508672]
InSCIt is a dataset for Information-Seeking Conversations with mixed-initiative Interactions.
It contains 4.7K user-agent turns from 805 human-human conversations.
We report results of two systems based on state-of-the-art models of conversational knowledge identification and open-domain question answering.
arXiv Detail & Related papers (2022-07-02T06:18:12Z) - Explore before Moving: A Feasible Path Estimation and Memory Recalling
Framework for Embodied Navigation [117.26891277593205]
We focus on the navigation and solve the problem of existing navigation algorithms lacking experience and common sense.
Inspired by the human ability to think twice before moving and conceive several feasible paths to seek a goal in unfamiliar scenes, we present a route planning method named Path Estimation and Memory Recalling framework.
We show strong experimental results of PEMR on the EmbodiedQA navigation task.
arXiv Detail & Related papers (2021-10-16T13:30:55Z) - Explore and Explain: Self-supervised Navigation and Recounting [43.52107532692226]
We devise a novel embodied setting in which an agent needs to explore a previously unknown environment while recounting what it sees during the path.
In this context, the agent needs to navigate the environment driven by an exploration goal, select proper moments for description, and output natural language descriptions of relevant objects and scenes.
Our model integrates a novel self-supervised exploration module with penalty, and a fully-attentive captioning model for explanation.
arXiv Detail & Related papers (2020-07-14T18:00:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.