Related papers: When and What to Ask Through World States and Text Instructions: IGLU NLP Challenge Solution

When and What to Ask Through World States and Text Instructions: IGLU NLP Challenge Solution

URL: http://arxiv.org/abs/2305.05754v1
Date: Tue, 9 May 2023 20:23:17 GMT
Title: When and What to Ask Through World States and Text Instructions: IGLU NLP Challenge Solution
Authors: Zhengxiang Shi, Jerome Ramos, To Eun Kim, Xi Wang, Hossein A. Rahmani, Aldo Lipani
Abstract summary: In collaborative tasks, effective communication is crucial for achieving joint goals. We aim to develop an intelligent builder agent to build structures based on user input through dialogue.
Score: 6.36729066736314
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In collaborative tasks, effective communication is crucial for achieving joint goals. One such task is collaborative building where builders must communicate with each other to construct desired structures in a simulated environment such as Minecraft. We aim to develop an intelligent builder agent to build structures based on user input through dialogue. However, in collaborative building, builders may encounter situations that are difficult to interpret based on the available information and instructions, leading to ambiguity. In the NeurIPS 2022 Competition NLP Task, we address two key research questions, with the goal of filling this gap: when should the agent ask for clarification, and what clarification questions should it ask? We move towards this target with two sub-tasks, a classification task and a ranking task. For the classification task, the goal is to determine whether the agent should ask for clarification based on the current world state and dialogue history. For the ranking task, the goal is to rank the relevant clarification questions from a pool of candidates. In this report, we briefly introduce our methods for the classification and ranking task. For the classification task, our model achieves an F1 score of 0.757, which placed the 3rd on the leaderboard. For the ranking task, our model achieves about 0.38 for Mean Reciprocal Rank by extending the traditional ranking model. Lastly, we discuss various neural approaches for the ranking task and future direction.

Related papers

AgentIF-OneDay: A Task-level Instruction-Following Benchmark for General AI Agents in Daily Scenarios [49.90735676070039]
The capacity of AI agents to effectively handle tasks of increasing duration and complexity continues to grow.<n>We argue that current evaluations prioritize increasing task difficulty without sufficiently addressing the diversity of agentic tasks.<n>We propose AgentIF-OneDay, aimed at determining whether general users can utilize natural language instructions and AI agents to complete a diverse array of daily tasks.
arXiv Detail & Related papers (2026-01-28T13:49:18Z)
Divide-and-Conquer: Tree-structured Strategy with Answer Distribution Estimator for Goal-Oriented Visual Dialogue [30.126882554391837]
Tree-Structured Strategy with Answer Distribution Estimator (TSADE) We propose a Tree-Structured Strategy with Answer Distribution Estimator (TSADE) which guides the question generation by excluding half of the current candidate objects in each round. We experimentally demonstrate that our method can enable the agents to achieve high task-oriented accuracy with fewer repeating questions and rounds compared to traditional ergodic question generation approaches.
arXiv Detail & Related papers (2025-02-09T08:16:09Z)
BAP v2: An Enhanced Task Framework for Instruction Following in Minecraft Dialogues [8.606637030092708]
We focus on the Builder Action Prediction (BAP) subtask in the Minecraft Collaborative Building Task (MCBT)<n>BAP predicts B's actions in a multimodal game context with limited training data.<n>We introduce BAP v2 to address key challenges in evaluation, training data, and modeling.
arXiv Detail & Related papers (2025-01-18T18:06:03Z)
Leverage Task Context for Object Affordance Ranking [57.59106517732223]
We build the first large-scale task-oriented affordance ranking dataset with 25 common tasks, over 50k images and more than 661k objects. Results demonstrate the feasibility of the task context based affordance learning paradigm and the superiority of our model over state-of-the-art models in the fields of saliency ranking and multimodal object detection.
arXiv Detail & Related papers (2024-11-25T04:22:33Z)
Learning-To-Rank Approach for Identifying Everyday Objects Using a Physical-World Search Engine [0.8749675983608172]
We focus on the task of retrieving target objects from open-vocabulary user instructions in a human-in-the-loop setting. We propose MultiRankIt, which is a novel approach for the learning-to-rank physical objects task.
arXiv Detail & Related papers (2023-12-26T01:40:31Z)
Don't Copy the Teacher: Data and Model Challenges in Embodied Dialogue [92.01165203498299]
Embodied dialogue instruction following requires an agent to complete a complex sequence of tasks from a natural language exchange. This paper argues that imitation learning (IL) and related low-level metrics are actually misleading and do not align with the goals of embodied dialogue research.
arXiv Detail & Related papers (2022-10-10T05:51:40Z)
Fast Inference and Transfer of Compositional Task Structures for Few-shot Task Generalization [101.72755769194677]
We formulate it as a few-shot reinforcement learning problem where a task is characterized by a subtask graph. Our multi-task subtask graph inferencer (MTSGI) first infers the common high-level task structure in terms of the subtask graph from the training tasks. Our experiment results on 2D grid-world and complex web navigation domains show that the proposed method can learn and leverage the common underlying structure of the tasks for faster adaptation to the unseen tasks.
arXiv Detail & Related papers (2022-05-25T10:44:25Z)
KETOD: Knowledge-Enriched Task-Oriented Dialogue [77.59814785157877]
Existing studies in dialogue system research mostly treat task-oriented dialogue and chit-chat as separate domains. We investigate how task-oriented dialogue and knowledge-grounded chit-chat can be effectively integrated into a single model.
arXiv Detail & Related papers (2022-05-11T16:01:03Z)
Learning to Execute Actions or Ask Clarification Questions [9.784428580459776]
We propose a new builder agent model capable of determining when to ask or execute instructions. Experimental results show that our model achieves state-of-the-art performance on the collaborative building task.
arXiv Detail & Related papers (2022-04-18T15:36:02Z)
Learning When and What to Ask: a Hierarchical Reinforcement Learning Framework [17.017688226277834]
We formulate a hierarchical reinforcement learning framework for learning to decide when to request additional information from humans. Results on a simulated human-assisted navigation problem demonstrate the effectiveness of our framework.
arXiv Detail & Related papers (2021-10-14T01:30:36Z)
Hierarchical Ranking for Answer Selection [19.379777219863964]
We propose a novel strategy for answer selection, called hierarchical ranking. We introduce three levels of ranking: point-level ranking, pair-level ranking, and list-level ranking. Experimental results on two public datasets, WikiQA and TREC-QA, demonstrate that the proposed hierarchical ranking is effective.
arXiv Detail & Related papers (2021-02-01T07:35:52Z)
CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning [138.40338621974954]
CausalWorld is a benchmark for causal structure and transfer learning in a robotic manipulation environment. Tasks consist of constructing 3D shapes from a given set of blocks - inspired by how children learn to build complex structures.
arXiv Detail & Related papers (2020-10-08T23:01:13Z)
Detecting and Classifying Malevolent Dialogue Responses: Taxonomy, Data and Methodology [68.8836704199096]
Corpus-based conversational interfaces are able to generate more diverse and natural responses than template-based or retrieval-based agents. With their increased generative capacity of corpusbased conversational agents comes the need to classify and filter out malevolent responses. Previous studies on the topic of recognizing and classifying inappropriate content are mostly focused on a certain category of malevolence.
arXiv Detail & Related papers (2020-08-21T22:43:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.