HoloAssist: an Egocentric Human Interaction Dataset for Interactive AI
Assistants in the Real World
- URL: http://arxiv.org/abs/2309.17024v1
- Date: Fri, 29 Sep 2023 07:17:43 GMT
- Title: HoloAssist: an Egocentric Human Interaction Dataset for Interactive AI
Assistants in the Real World
- Authors: Xin Wang, Taein Kwon, Mahdi Rad, Bowen Pan, Ishani Chakraborty, Sean
Andrist, Dan Bohus, Ashley Feniello, Bugra Tekin, Felipe Vieira Frujeri, Neel
Joshi, Marc Pollefeys
- Abstract summary: This work is part of a broader research effort to develop intelligent agents that can interactively guide humans through performing tasks in the physical world.
We introduce HoloAssist, a large-scale egocentric human interaction dataset.
We present key insights into how human assistants correct mistakes, intervene in the task completion procedure, and ground their instructions to the environment.
- Score: 48.90399899928823
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Building an interactive AI assistant that can perceive, reason, and
collaborate with humans in the real world has been a long-standing pursuit in
the AI community. This work is part of a broader research effort to develop
intelligent agents that can interactively guide humans through performing tasks
in the physical world. As a first step in this direction, we introduce
HoloAssist, a large-scale egocentric human interaction dataset, where two
people collaboratively complete physical manipulation tasks. The task performer
executes the task while wearing a mixed-reality headset that captures seven
synchronized data streams. The task instructor watches the performer's
egocentric video in real time and guides them verbally. By augmenting the data
with action and conversational annotations and observing the rich behaviors of
various participants, we present key insights into how human assistants correct
mistakes, intervene in the task completion procedure, and ground their
instructions to the environment. HoloAssist spans 166 hours of data captured by
350 unique instructor-performer pairs. Furthermore, we construct and present
benchmarks on mistake detection, intervention type prediction, and hand
forecasting, along with detailed analysis. We expect HoloAssist will provide an
important resource for building AI assistants that can fluidly collaborate with
humans in the real world. Data can be downloaded at
https://holoassist.github.io/.
Related papers
- EgoMimic: Scaling Imitation Learning via Egocentric Video [22.902881956495765]
We present EgoMimic, a full-stack framework which scales manipulation via human embodiment data.
EgoMimic achieves this through: (1) a system to capture human embodiment data using the ergonomic Project Aria glasses, (2) a low-cost bimanual manipulator that minimizes the kinematic gap to human data, and (4) an imitation learning architecture that co-trains on human and robot data.
arXiv Detail & Related papers (2024-10-31T17:59:55Z) - EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World [44.34800426136217]
We introduce EgoExoLearn, a dataset that emulates the human demonstration following process.
EgoExoLearn contains egocentric and demonstration video data spanning 120 hours.
We present benchmarks such as cross-view association, cross-view action planning, and cross-view referenced skill assessment.
arXiv Detail & Related papers (2024-03-24T15:00:44Z) - RH20T: A Comprehensive Robotic Dataset for Learning Diverse Skills in
One-Shot [56.130215236125224]
A key challenge in robotic manipulation in open domains is how to acquire diverse and generalizable skills for robots.
Recent research in one-shot imitation learning has shown promise in transferring trained policies to new tasks based on demonstrations.
This paper aims to unlock the potential for an agent to generalize to hundreds of real-world skills with multi-modal perception.
arXiv Detail & Related papers (2023-07-02T15:33:31Z) - Improving Grounded Language Understanding in a Collaborative Environment
by Interacting with Agents Through Help Feedback [42.19685958922537]
We argue that human-AI collaboration should be interactive, with humans monitoring the work of AI agents and providing feedback that the agent can understand and utilize.
In this work, we explore these directions using the challenging task defined by the IGLU competition, an interactive grounded language understanding task in a MineCraft-like world.
arXiv Detail & Related papers (2023-04-21T05:37:59Z) - Why is AI not a Panacea for Data Workers? An Interview Study on Human-AI
Collaboration in Data Storytelling [59.08591308749448]
We interviewed eighteen data workers from both industry and academia to learn where and how they would like to collaborate with AI.
Surprisingly, though the participants showed excitement about collaborating with AI, many of them also expressed reluctance and pointed out nuanced reasons.
arXiv Detail & Related papers (2023-04-17T15:30:05Z) - Egocentric Video Task Translation [109.30649877677257]
We propose EgoTask Translation (EgoT2), which takes a collection of models optimized on separate tasks and learns to translate their outputs for improved performance on any or all of them at once.
Unlike traditional transfer or multi-task learning, EgoT2's flipped design entails separate task-specific backbones and a task translator shared across all tasks, which captures synergies between even heterogeneous tasks and mitigates task competition.
arXiv Detail & Related papers (2022-12-13T00:47:13Z) - EgoTaskQA: Understanding Human Tasks in Egocentric Videos [89.9573084127155]
EgoTaskQA benchmark provides home for crucial dimensions of task understanding through question-answering on real-world egocentric videos.
We meticulously design questions that target the understanding of (1) action dependencies and effects, (2) intents and goals, and (3) agents' beliefs about others.
We evaluate state-of-the-art video reasoning models on our benchmark and show their significant gaps between humans in understanding complex goal-oriented egocentric videos.
arXiv Detail & Related papers (2022-10-08T05:49:05Z) - Learning Generalizable Robotic Reward Functions from "In-The-Wild" Human
Videos [59.58105314783289]
Domain-agnostic Video Discriminator (DVD) learns multitask reward functions by training a discriminator to classify whether two videos are performing the same task.
DVD can generalize by virtue of learning from a small amount of robot data with a broad dataset of human videos.
DVD can be combined with visual model predictive control to solve robotic manipulation tasks on a real WidowX200 robot in an unseen environment from a single human demo.
arXiv Detail & Related papers (2021-03-31T05:25:05Z) - The MECCANO Dataset: Understanding Human-Object Interactions from
Egocentric Videos in an Industrial-like Domain [20.99718135562034]
We introduce MECCANO, the first dataset of egocentric videos to study human-object interactions in industrial-like settings.
The dataset has been explicitly labeled for the task of recognizing human-object interactions from an egocentric perspective.
Baseline results show that the MECCANO dataset is a challenging benchmark to study egocentric human-object interactions in industrial-like scenarios.
arXiv Detail & Related papers (2020-10-12T12:50:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.