Can Foundation Models Watch, Talk and Guide You Step by Step to Make a
Cake?
- URL: http://arxiv.org/abs/2311.00738v1
- Date: Wed, 1 Nov 2023 15:13:49 GMT
- Title: Can Foundation Models Watch, Talk and Guide You Step by Step to Make a
Cake?
- Authors: Yuwei Bao, Keunwoo Peter Yu, Yichi Zhang, Shane Storks, Itamar
Bar-Yossef, Alexander De La Iglesia, Megan Su, Xiao Lin Zheng, Joyce Chai
- Abstract summary: Despite advances in AI, it remains a significant challenge to develop interactive task guidance systems.
We created a new multimodal benchmark dataset, Watch, Talk and Guide (WTaG) based on natural interaction between a human user and a human instructor.
We leveraged several foundation models to study to what extent these models can be quickly adapted to perceptually enabled task guidance.
- Score: 62.59699229202307
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Despite tremendous advances in AI, it remains a significant challenge to
develop interactive task guidance systems that can offer situated, personalized
guidance and assist humans in various tasks. These systems need to have a
sophisticated understanding of the user as well as the environment, and make
timely accurate decisions on when and what to say. To address this issue, we
created a new multimodal benchmark dataset, Watch, Talk and Guide (WTaG) based
on natural interaction between a human user and a human instructor. We further
proposed two tasks: User and Environment Understanding, and Instructor Decision
Making. We leveraged several foundation models to study to what extent these
models can be quickly adapted to perceptually enabled task guidance. Our
quantitative, qualitative, and human evaluation results show that these models
can demonstrate fair performances in some cases with no task-specific training,
but a fast and reliable adaptation remains a significant challenge. Our
benchmark and baselines will provide a stepping stone for future work on
situated task guidance.
Related papers
- Optimising Human-AI Collaboration by Learning Convincing Explanations [62.81395661556852]
We propose a method for a collaborative system that remains safe by having a human making decisions.
Ardent enables efficient and effective decision-making by adapting to individual preferences for explanations.
arXiv Detail & Related papers (2023-11-13T16:00:16Z) - Designing Closed-Loop Models for Task Allocation [36.04165658325371]
We exploit weak prior information on human-task similarity to bootstrap model training.
We show that the use of such a weak prior can improve task allocation accuracy, even when human decision-makers are fallible and biased.
arXiv Detail & Related papers (2023-05-31T13:57:56Z) - Object-Centric Multi-Task Learning for Human Instances [8.035105819936808]
We explore a compact multi-task network architecture that maximally shares the parameters of the multiple tasks via object-centric learning.
We propose a novel query design to encode the human instance information effectively, called human-centric query (HCQ)
Experimental results show that the proposed multi-task network achieves comparable accuracy to state-of-the-art task-specific models.
arXiv Detail & Related papers (2023-03-13T01:10:50Z) - Learning by Asking for Embodied Visual Navigation and Task Completion [20.0182240268864]
We propose an Embodied Learning-By-Asking (ELBA) model that learns when and what questions to ask to dynamically acquire additional information for completing the task.
Experimental results show that ELBA achieves improved task performance compared to baseline models without question-answering capabilities.
arXiv Detail & Related papers (2023-02-09T18:59:41Z) - Measuring Progress on Scalable Oversight for Large Language Models [19.705153174673576]
We present an experimental design centered on choosing tasks for which human specialists succeed but unaided humans and current general AI systems fail.
We find that human participants who interact with an unreliable large-language-model dialog assistant through chat substantially outperform both the model alone and their own unaided performance.
arXiv Detail & Related papers (2022-11-04T17:03:49Z) - Task Compass: Scaling Multi-task Pre-training with Task Prefix [122.49242976184617]
Existing studies show that multi-task learning with large-scale supervised tasks suffers from negative effects across tasks.
We propose a task prefix guided multi-task pre-training framework to explore the relationships among tasks.
Our model can not only serve as the strong foundation backbone for a wide range of tasks but also be feasible as a probing tool for analyzing task relationships.
arXiv Detail & Related papers (2022-10-12T15:02:04Z) - Autonomous Open-Ended Learning of Tasks with Non-Stationary
Interdependencies [64.0476282000118]
Intrinsic motivations have proven to generate a task-agnostic signal to properly allocate the training time amongst goals.
While the majority of works in the field of intrinsically motivated open-ended learning focus on scenarios where goals are independent from each other, only few of them studied the autonomous acquisition of interdependent tasks.
In particular, we first deepen the analysis of a previous system, showing the importance of incorporating information about the relationships between tasks at a higher level of the architecture.
Then we introduce H-GRAIL, a new system that extends the previous one by adding a new learning layer to store the autonomously acquired sequences
arXiv Detail & Related papers (2022-05-16T10:43:01Z) - Recent Advances in Leveraging Human Guidance for Sequential
Decision-Making Tasks [60.380501589764144]
A longstanding goal of artificial intelligence is to create artificial agents capable of learning to perform tasks that require sequential decision making.
While it is the artificial agent that learns and acts, it is still up to humans to specify the particular task to be performed.
This survey provides a high-level overview of five recent machine learning frameworks that primarily rely on human guidance.
arXiv Detail & Related papers (2021-07-13T03:11:04Z) - CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and
Transfer Learning [138.40338621974954]
CausalWorld is a benchmark for causal structure and transfer learning in a robotic manipulation environment.
Tasks consist of constructing 3D shapes from a given set of blocks - inspired by how children learn to build complex structures.
arXiv Detail & Related papers (2020-10-08T23:01:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.