IDAT: A Multi-Modal Dataset and Toolkit for Building and Evaluating Interactive Task-Solving Agents
- URL: http://arxiv.org/abs/2407.08898v1
- Date: Fri, 12 Jul 2024 00:07:43 GMT
- Title: IDAT: A Multi-Modal Dataset and Toolkit for Building and Evaluating Interactive Task-Solving Agents
- Authors: Shrestha Mohanty, Negar Arabzadeh, Andrea Tupini, Yuxuan Sun, Alexey Skrynnik, Artem Zholus, Marc-Alexandre Côté, Julia Kiseleva,
- Abstract summary: This paper addresses the challenges of developing interactive agents capable of understanding and executing grounded natural language instructions.
We introduce a scalable data collection tool for gathering interactive grounded language instructions within a Minecraft-like environment.
We present a Human-in-the-Loop interactive evaluation platform for qualitative analysis and comparison of agent performance.
- Score: 20.460482488872145
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Seamless interaction between AI agents and humans using natural language remains a key goal in AI research. This paper addresses the challenges of developing interactive agents capable of understanding and executing grounded natural language instructions through the IGLU competition at NeurIPS. Despite advancements, challenges such as a scarcity of appropriate datasets and the need for effective evaluation platforms persist. We introduce a scalable data collection tool for gathering interactive grounded language instructions within a Minecraft-like environment, resulting in a Multi-Modal dataset with around 9,000 utterances and over 1,000 clarification questions. Additionally, we present a Human-in-the-Loop interactive evaluation platform for qualitative analysis and comparison of agent performance through multi-turn communication with human annotators. We offer to the community these assets referred to as IDAT (IGLU Dataset And Toolkit) which aim to advance the development of intelligent, interactive AI agents and provide essential resources for further research.
Related papers
- Simulating User Agents for Embodied Conversational-AI [9.402740034754455]
We build a large language model (LLM)-based user agent that can simulate user behavior during interactions with an embodied agent.
We evaluate our user agent's ability to generate human-like behaviors by comparing its simulated dialogues with the TEACh dataset.
arXiv Detail & Related papers (2024-10-31T00:56:08Z) - A Survey on Complex Tasks for Goal-Directed Interactive Agents [60.53915548970061]
This survey compiles relevant tasks and environments for evaluating goal-directed interactive agents.
An up-to-date compilation of relevant resources can be found on our project website.
arXiv Detail & Related papers (2024-09-27T08:17:53Z) - CI-Bench: Benchmarking Contextual Integrity of AI Assistants on Synthetic Data [7.357348564300953]
CI-Bench is a comprehensive benchmark for evaluating the ability of AI assistants to protect personal information during model inference.
We present a novel, scalable, multi-step data pipeline for generating natural communications, including dialogues and emails.
We formulate and evaluate a naive AI assistant to demonstrate the need for further study and careful training towards personal assistant tasks.
arXiv Detail & Related papers (2024-09-20T21:14:36Z) - Tachikuma: Understading Complex Interactions with Multi-Character and
Novel Objects by Large Language Models [67.20964015591262]
We introduce a benchmark named Tachikuma, comprising a Multiple character and novel Object based interaction Estimation task and a supporting dataset.
The dataset captures log data from real-time communications during gameplay, providing diverse, grounded, and complex interactions for further explorations.
We present a simple prompting baseline and evaluate its performance, demonstrating its effectiveness in enhancing interaction understanding.
arXiv Detail & Related papers (2023-07-24T07:40:59Z) - Does Collaborative Human-LM Dialogue Generation Help Information
Extraction from Human Dialogues? [55.28340832822234]
Problem-solving human dialogues in real applications can be much more complex than existing Wizard-of-Oz collections.
We introduce a human-in-the-loop dialogue generation framework capable of synthesizing realistic dialogues.
arXiv Detail & Related papers (2023-07-13T20:02:50Z) - Interactive Natural Language Processing [67.87925315773924]
Interactive Natural Language Processing (iNLP) has emerged as a novel paradigm within the field of NLP.
This paper offers a comprehensive survey of iNLP, starting by proposing a unified definition and framework of the concept.
arXiv Detail & Related papers (2023-05-22T17:18:29Z) - Transforming Human-Centered AI Collaboration: Redefining Embodied Agents
Capabilities through Interactive Grounded Language Instructions [23.318236094953072]
Human intelligence's adaptability is remarkable, allowing us to adjust to new tasks and multi-modal environments swiftly.
The research community is actively pursuing the development of interactive "embodied agents"
These agents must possess the ability to promptly request feedback in case communication breaks down or instructions are unclear.
arXiv Detail & Related papers (2023-05-18T07:51:33Z) - Improving Grounded Language Understanding in a Collaborative Environment
by Interacting with Agents Through Help Feedback [42.19685958922537]
We argue that human-AI collaboration should be interactive, with humans monitoring the work of AI agents and providing feedback that the agent can understand and utilize.
In this work, we explore these directions using the challenging task defined by the IGLU competition, an interactive grounded language understanding task in a MineCraft-like world.
arXiv Detail & Related papers (2023-04-21T05:37:59Z) - Collecting Interactive Multi-modal Datasets for Grounded Language
Understanding [66.30648042100123]
We formalized the collaborative embodied agent using natural language task.
We developed a tool for extensive and scalable data collection.
We collected the first dataset for interactive grounded language understanding.
arXiv Detail & Related papers (2022-11-12T02:36:32Z) - SPA: Verbal Interactions between Agents and Avatars in Shared Virtual
Environments using Propositional Planning [61.335252950832256]
Sense-Plan-Ask, or SPA, generates plausible verbal interactions between virtual human-like agents and user avatars in shared virtual environments.
We find that our algorithm creates a small runtime cost and enables agents to complete their goals more effectively than agents without the ability to leverage natural-language communication.
arXiv Detail & Related papers (2020-02-08T23:15:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.