Learning to Execute Actions or Ask Clarification Questions
- URL: http://arxiv.org/abs/2204.08373v2
- Date: Wed, 20 Apr 2022 19:35:37 GMT
- Title: Learning to Execute Actions or Ask Clarification Questions
- Authors: Zhengxiang Shi, Yue Feng, Aldo Lipani
- Abstract summary: We propose a new builder agent model capable of determining when to ask or execute instructions.
Experimental results show that our model achieves state-of-the-art performance on the collaborative building task.
- Score: 9.784428580459776
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Collaborative tasks are ubiquitous activities where a form of communication
is required in order to reach a joint goal. Collaborative building is one of
such tasks. We wish to develop an intelligent builder agent in a simulated
building environment (Minecraft) that can build whatever users wish to build by
just talking to the agent. In order to achieve this goal, such agents need to
be able to take the initiative by asking clarification questions when further
information is needed. Existing works on Minecraft Corpus Dataset only learn to
execute instructions neglecting the importance of asking for clarifications. In
this paper, we extend the Minecraft Corpus Dataset by annotating all builder
utterances into eight types, including clarification questions, and propose a
new builder agent model capable of determining when to ask or execute
instructions. Experimental results show that our model achieves
state-of-the-art performance on the collaborative building task with a
substantial improvement. We also define two new tasks, the learning to ask task
and the joint learning task. The latter consists of solving both collaborating
building and learning to ask tasks jointly.
Related papers
- A Survey on Complex Tasks for Goal-Directed Interactive Agents [60.53915548970061]
This survey compiles relevant tasks and environments for evaluating goal-directed interactive agents.
An up-to-date compilation of relevant resources can be found on our project website.
arXiv Detail & Related papers (2024-09-27T08:17:53Z) - OAKINK2: A Dataset of Bimanual Hands-Object Manipulation in Complex Task Completion [39.14950571922401]
OAKINK2 is a dataset of bimanual object manipulation tasks for complex daily activities.
It introduces three level of abstraction to organize the manipulation tasks: Affordance, Primitive Task, and Complex Task.
OakINK2 dataset provides multi-view image streams and precise pose annotations for the human body, hands and various interacting objects.
arXiv Detail & Related papers (2024-03-28T13:47:19Z) - When and What to Ask Through World States and Text Instructions: IGLU
NLP Challenge Solution [6.36729066736314]
In collaborative tasks, effective communication is crucial for achieving joint goals.
We aim to develop an intelligent builder agent to build structures based on user input through dialogue.
arXiv Detail & Related papers (2023-05-09T20:23:17Z) - Learning by Asking for Embodied Visual Navigation and Task Completion [20.0182240268864]
We propose an Embodied Learning-By-Asking (ELBA) model that learns when and what questions to ask to dynamically acquire additional information for completing the task.
Experimental results show that ELBA achieves improved task performance compared to baseline models without question-answering capabilities.
arXiv Detail & Related papers (2023-02-09T18:59:41Z) - Language-guided Task Adaptation for Imitation Learning [40.1007184209417]
We introduce a novel setting, wherein an agent needs to learn a task from a demonstration of a related task with the difference between the tasks communicated in natural language.
The proposed setting allows reusing demonstrations from other tasks, by providing low effort language descriptions, and can also be used to provide feedback to correct agent errors.
arXiv Detail & Related papers (2023-01-24T00:56:43Z) - Fast Inference and Transfer of Compositional Task Structures for
Few-shot Task Generalization [101.72755769194677]
We formulate it as a few-shot reinforcement learning problem where a task is characterized by a subtask graph.
Our multi-task subtask graph inferencer (MTSGI) first infers the common high-level task structure in terms of the subtask graph from the training tasks.
Our experiment results on 2D grid-world and complex web navigation domains show that the proposed method can learn and leverage the common underlying structure of the tasks for faster adaptation to the unseen tasks.
arXiv Detail & Related papers (2022-05-25T10:44:25Z) - KETOD: Knowledge-Enriched Task-Oriented Dialogue [77.59814785157877]
Existing studies in dialogue system research mostly treat task-oriented dialogue and chit-chat as separate domains.
We investigate how task-oriented dialogue and knowledge-grounded chit-chat can be effectively integrated into a single model.
arXiv Detail & Related papers (2022-05-11T16:01:03Z) - Towards Collaborative Question Answering: A Preliminary Study [63.91687114660126]
We propose CollabQA, a novel QA task in which several expert agents coordinated by a moderator work together to answer questions that cannot be answered with any single agent alone.
We make a synthetic dataset of a large knowledge graph that can be distributed to experts.
We show that the problem can be challenging without introducing prior to the collaboration structure, unless experts are perfect and uniform.
arXiv Detail & Related papers (2022-01-24T14:27:00Z) - Learning to Guide and to Be Guided in the Architect-Builder Problem [0.0]
We are interested in interactive agents that learn to coordinate, namely, a $builder$ which performs actions but ignores the goal of the task.
We propose Architect-Builder Iterated Guiding (ABIG) as a solution to the Architect-Builder Problem.
ABIG results in a low-level, high-frequency, guiding communication protocol that enables an architect-builder pair to solve the task at hand.
arXiv Detail & Related papers (2021-12-14T12:57:27Z) - CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and
Transfer Learning [138.40338621974954]
CausalWorld is a benchmark for causal structure and transfer learning in a robotic manipulation environment.
Tasks consist of constructing 3D shapes from a given set of blocks - inspired by how children learn to build complex structures.
arXiv Detail & Related papers (2020-10-08T23:01:13Z) - LEMMA: A Multi-view Dataset for Learning Multi-agent Multi-task
Activities [119.88381048477854]
We introduce the LEMMA dataset to provide a single home to address missing dimensions with meticulously designed settings.
We densely annotate the atomic-actions with human-object interactions to provide ground-truths of the compositionality, scheduling, and assignment of daily activities.
We hope this effort would drive the machine vision community to examine goal-directed human activities and further study the task scheduling and assignment in the real world.
arXiv Detail & Related papers (2020-07-31T00:13:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.