Human in the loop approaches in multi-modal conversational task guidance
system development
- URL: http://arxiv.org/abs/2211.01824v1
- Date: Thu, 3 Nov 2022 14:05:30 GMT
- Title: Human in the loop approaches in multi-modal conversational task guidance
system development
- Authors: Ramesh Manuvinakurike, Sovan Biswas, Giuseppe Raffa, Richard Beckwith,
Anthony Rhodes, Meng Shi, Gesem Gudino Mejia, Saurav Sahay, Lama Nachman
- Abstract summary: Development of task guidance systems for aiding humans in a situated task remains a challenging problem.
We first highlight some of the challenges involved during the development of such systems.
We then provide an overview of existing datasets available and highlight their limitations.
- Score: 6.493148232868973
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Development of task guidance systems for aiding humans in a situated task
remains a challenging problem. The role of search (information retrieval) and
conversational systems for task guidance has immense potential to help the task
performers achieve various goals. However, there are several technical
challenges that need to be addressed to deliver such conversational systems,
where common supervised approaches fail to deliver the expected results in
terms of overall performance, user experience and adaptation to realistic
conditions. In this preliminary work we first highlight some of the challenges
involved during the development of such systems. We then provide an overview of
existing datasets available and highlight their limitations. We finally develop
a model-in-the-loop wizard-of-oz based data collection tool and perform a pilot
experiment.
Related papers
- SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories [55.161075901665946]
Super aims to capture the realistic challenges faced by researchers working with Machine Learning (ML) and Natural Language Processing (NLP) research repositories.
Our benchmark comprises three distinct problem sets: 45 end-to-end problems with annotated expert solutions, 152 sub problems derived from the expert set that focus on specific challenges, and 602 automatically generated problems for larger-scale development.
We show that state-of-the-art approaches struggle to solve these problems with the best model (GPT-4o) solving only 16.3% of the end-to-end set, and 46.1% of the scenarios.
arXiv Detail & Related papers (2024-09-11T17:37:48Z) - Can Foundation Models Watch, Talk and Guide You Step by Step to Make a
Cake? [62.59699229202307]
Despite advances in AI, it remains a significant challenge to develop interactive task guidance systems.
We created a new multimodal benchmark dataset, Watch, Talk and Guide (WTaG) based on natural interaction between a human user and a human instructor.
We leveraged several foundation models to study to what extent these models can be quickly adapted to perceptually enabled task guidance.
arXiv Detail & Related papers (2023-11-01T15:13:49Z) - Solving Dialogue Grounding Embodied Task in a Simulated Environment
using Further Masked Language Modeling [0.0]
Our proposed method employs language modeling to enhance task understanding through state-of-the-art (SOTA) methods using language models.
Our experimental results provide compelling evidence of the superiority of our proposed method.
arXiv Detail & Related papers (2023-06-21T17:17:09Z) - Learning by Asking for Embodied Visual Navigation and Task Completion [20.0182240268864]
We propose an Embodied Learning-By-Asking (ELBA) model that learns when and what questions to ask to dynamically acquire additional information for completing the task.
Experimental results show that ELBA achieves improved task performance compared to baseline models without question-answering capabilities.
arXiv Detail & Related papers (2023-02-09T18:59:41Z) - Measuring Progress on Scalable Oversight for Large Language Models [19.705153174673576]
We present an experimental design centered on choosing tasks for which human specialists succeed but unaided humans and current general AI systems fail.
We find that human participants who interact with an unreliable large-language-model dialog assistant through chat substantially outperform both the model alone and their own unaided performance.
arXiv Detail & Related papers (2022-11-04T17:03:49Z) - Task Compass: Scaling Multi-task Pre-training with Task Prefix [122.49242976184617]
Existing studies show that multi-task learning with large-scale supervised tasks suffers from negative effects across tasks.
We propose a task prefix guided multi-task pre-training framework to explore the relationships among tasks.
Our model can not only serve as the strong foundation backbone for a wide range of tasks but also be feasible as a probing tool for analyzing task relationships.
arXiv Detail & Related papers (2022-10-12T15:02:04Z) - Task Allocation using a Team of Robots [29.024300177453824]
We present a general formulation of the task allocation problem that generalizes several versions that are well-studied.
Our formulation includes the states of robots, tasks, and the surrounding environment in which they operate.
We describe how the problem can vary depending on the feasibility constraints, objective functions, and the level of dynamically changing information.
arXiv Detail & Related papers (2022-07-20T04:49:11Z) - BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning [108.41464483878683]
We study the problem of enabling a vision-based robotic manipulation system to generalize to novel tasks.
We develop an interactive and flexible imitation learning system that can learn from both demonstrations and interventions.
When scaling data collection on a real robot to more than 100 distinct tasks, we find that this system can perform 24 unseen manipulation tasks with an average success rate of 44%.
arXiv Detail & Related papers (2022-02-04T07:30:48Z) - Multi-Task Learning with Sequence-Conditioned Transporter Networks [67.57293592529517]
We aim to solve multi-task learning through the lens of sequence-conditioning and weighted sampling.
We propose a new suite of benchmark aimed at compositional tasks, MultiRavens, which allows defining custom task combinations.
Second, we propose a vision-based end-to-end system architecture, Sequence-Conditioned Transporter Networks, which augments Goal-Conditioned Transporter Networks with sequence-conditioning and weighted sampling.
arXiv Detail & Related papers (2021-09-15T21:19:11Z) - Batch Exploration with Examples for Scalable Robotic Reinforcement
Learning [63.552788688544254]
Batch Exploration with Examples (BEE) explores relevant regions of the state-space guided by a modest number of human provided images of important states.
BEE is able to tackle challenging vision-based manipulation tasks both in simulation and on a real Franka robot.
arXiv Detail & Related papers (2020-10-22T17:49:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.