TACO: Benchmarking Generalizable Bimanual Tool-ACtion-Object Understanding
- URL: http://arxiv.org/abs/2401.08399v2
- Date: Mon, 25 Mar 2024 16:50:43 GMT
- Title: TACO: Benchmarking Generalizable Bimanual Tool-ACtion-Object Understanding
- Authors: Yun Liu, Haolin Yang, Xu Si, Ling Liu, Zipeng Li, Yuxiang Zhang, Yebin Liu, Li Yi,
- Abstract summary: TACO is an extensive dataset spanning a large variety of tool-action-object compositions for daily human activities.
TACO contains 2.5K motion sequences paired with third-person and egocentric views, precise hand-object 3D meshes, and action labels.
We benchmark three generalizable hand-object-interaction tasks: compositional action recognition, generalizable hand-object motion forecasting, and cooperative grasp synthesis.
- Score: 44.206222326616526
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Humans commonly work with multiple objects in daily life and can intuitively transfer manipulation skills to novel objects by understanding object functional regularities. However, existing technical approaches for analyzing and synthesizing hand-object manipulation are mostly limited to handling a single hand and object due to the lack of data support. To address this, we construct TACO, an extensive bimanual hand-object-interaction dataset spanning a large variety of tool-action-object compositions for daily human activities. TACO contains 2.5K motion sequences paired with third-person and egocentric views, precise hand-object 3D meshes, and action labels. To rapidly expand the data scale, we present a fully automatic data acquisition pipeline combining multi-view sensing with an optical motion capture system. With the vast research fields provided by TACO, we benchmark three generalizable hand-object-interaction tasks: compositional action recognition, generalizable hand-object motion forecasting, and cooperative grasp synthesis. Extensive experiments reveal new insights, challenges, and opportunities for advancing the studies of generalizable hand-object motion analysis and synthesis. Our data and code are available at https://taco2024.github.io.
Related papers
- HIMO: A New Benchmark for Full-Body Human Interacting with Multiple Objects [86.86284624825356]
HIMO is a dataset of full-body human interacting with multiple objects.
HIMO contains 3.3K 4D HOI sequences and 4.08M 3D HOI frames.
arXiv Detail & Related papers (2024-07-17T07:47:34Z) - OAKINK2: A Dataset of Bimanual Hands-Object Manipulation in Complex Task Completion [39.14950571922401]
OAKINK2 is a dataset of bimanual object manipulation tasks for complex daily activities.
It introduces three level of abstraction to organize the manipulation tasks: Affordance, Primitive Task, and Complex Task.
OakINK2 dataset provides multi-view image streams and precise pose annotations for the human body, hands and various interacting objects.
arXiv Detail & Related papers (2024-03-28T13:47:19Z) - ADL4D: Towards A Contextually Rich Dataset for 4D Activities of Daily
Living [4.221961702292134]
ADL4D is a dataset of up to two subjects inter- acting with different sets of objects performing Activities of Daily Living (ADL)
Our dataset consists of 75 sequences with a total of 1.1M RGB-D frames, hand and object poses, and per-hand fine-grained action annotations.
We develop an automatic system for multi-view multi-hand 3D pose an- notation capable of tracking hand poses over time.
arXiv Detail & Related papers (2024-02-27T18:51:52Z) - ATTACH Dataset: Annotated Two-Handed Assembly Actions for Human Action
Understanding [8.923830513183882]
We present the ATTACH dataset, which contains 51.6 hours of assembly with 95.2k annotated fine-grained actions monitored by three cameras.
In the ATTACH dataset, more than 68% of annotations overlap with other annotations, which is many times more than in related datasets.
We report the performance of state-of-the-art methods for action recognition as well as action detection on video and skeleton-sequence inputs.
arXiv Detail & Related papers (2023-04-17T12:31:24Z) - Full-Body Articulated Human-Object Interaction [61.01135739641217]
CHAIRS is a large-scale motion-captured f-AHOI dataset consisting of 16.2 hours of versatile interactions.
CHAIRS provides 3D meshes of both humans and articulated objects during the entire interactive process.
By learning the geometrical relationships in HOI, we devise the very first model that leverage human pose estimation.
arXiv Detail & Related papers (2022-12-20T19:50:54Z) - ObjectFolder: A Dataset of Objects with Implicit Visual, Auditory, and
Tactile Representations [52.226947570070784]
We present Object, a dataset of 100 objects that addresses both challenges with two key innovations.
First, Object encodes the visual, auditory, and tactile sensory data for all objects, enabling a number of multisensory object recognition tasks.
Second, Object employs a uniform, object-centric simulations, and implicit representation for each object's visual textures, tactile readings, and tactile readings, making the dataset flexible to use and easy to share.
arXiv Detail & Related papers (2021-09-16T14:00:59Z) - The IKEA ASM Dataset: Understanding People Assembling Furniture through
Actions, Objects and Pose [108.21037046507483]
IKEA ASM is a three million frame, multi-view, furniture assembly video dataset that includes depth, atomic actions, object segmentation, and human pose.
We benchmark prominent methods for video action recognition, object segmentation and human pose estimation tasks on this challenging dataset.
The dataset enables the development of holistic methods, which integrate multi-modal and multi-view data to better perform on these tasks.
arXiv Detail & Related papers (2020-07-01T11:34:46Z) - Joint Hand-object 3D Reconstruction from a Single Image with
Cross-branch Feature Fusion [78.98074380040838]
We propose to consider hand and object jointly in feature space and explore the reciprocity of the two branches.
We employ an auxiliary depth estimation module to augment the input RGB image with the estimated depth map.
Our approach significantly outperforms existing approaches in terms of the reconstruction accuracy of objects.
arXiv Detail & Related papers (2020-06-28T09:50:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.