Embody 3D: A Large-scale Multimodal Motion and Behavior Dataset
- URL: http://arxiv.org/abs/2510.16258v1
- Date: Fri, 17 Oct 2025 23:06:36 GMT
- Title: Embody 3D: A Large-scale Multimodal Motion and Behavior Dataset
- Authors: Claire McLean, Makenzie Meendering, Tristan Swartz, Orri Gabbay, Alexandra Olsen, Rachel Jacobs, Nicholas Rosen, Philippe de Bree, Tony Garcia, Gadsden Merrill, Jake Sandakly, Julia Buffalini, Neham Jain, Steven Krenn, Moneish Kumar, Dejan Markovic, Evonne Ng, Fabian Prada, Andrew Saba, Siwei Zhang, Vasu Agrawal, Tim Godisart, Alexander Richard, Michael Zollhoefer,
- Abstract summary: Embody 3D is a multimodal dataset of 500 individual hours of 3D motion data from 439 participants collected in a multi-camera collection stage.<n>The dataset features a wide range of single-person motion data, including prompted motions, hand gestures, and locomotion.<n>We provide tracked human motion including hand tracking and body shape, text annotations, and a separate audio track for each participant.
- Score: 64.37557018733408
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Codec Avatars Lab at Meta introduces Embody 3D, a multimodal dataset of 500 individual hours of 3D motion data from 439 participants collected in a multi-camera collection stage, amounting to over 54 million frames of tracked 3D motion. The dataset features a wide range of single-person motion data, including prompted motions, hand gestures, and locomotion; as well as multi-person behavioral and conversational data like discussions, conversations in different emotional states, collaborative activities, and co-living scenarios in an apartment-like space. We provide tracked human motion including hand tracking and body shape, text annotations, and a separate audio track for each participant.
Related papers
- InterPose: Learning to Generate Human-Object Interactions from Large-Scale Web Videos [35.49253697710172]
We propose an automatic motion extraction pipeline and use it to collect interaction-rich human motions.<n>Our new dataset InterPose contains 73.8K sequences of 3D human motions and corresponding text captions automatically obtained from 45.8K videos with human-object interactions.
arXiv Detail & Related papers (2025-08-31T09:38:59Z) - Motion-X++: A Large-Scale Multimodal 3D Whole-body Human Motion Dataset [35.47253826828815]
Motion-X++ is a large-scale multimodal 3D expressive whole-body human motion dataset.<n> Motion-X++ provides 19.5M 3D whole-body pose annotations covering 120.5K motion sequences from massive scenes.
arXiv Detail & Related papers (2025-01-09T09:37:27Z) - HOT3D: Hand and Object Tracking in 3D from Egocentric Multi-View Videos [9.513100627302755]
The dataset offers over 833 minutes (3.7M+ images) of recordings that feature 19 subjects interacting with 33 diverse rigid objects.<n>The recordings include multiple synchronized data streams containing egocentric multi-view RGB/monochrome images, eye gaze signal, scene point clouds, and 3D poses of cameras, hands, and objects.<n>In our experiments, we demonstrate the effectiveness of multi-view egocentric data for three popular tasks: 3D hand tracking, model-based 6DoF object pose estimation, and 3D lifting of unknown in-hand objects.
arXiv Detail & Related papers (2024-11-28T14:09:42Z) - Nymeria: A Massive Collection of Multimodal Egocentric Daily Motion in the Wild [66.34146236875822]
The Nymeria dataset is a large-scale, diverse, richly annotated human motion dataset collected in the wild with multiple multimodal egocentric devices.
It contains 1200 recordings of 300 hours of daily activities from 264 participants across 50 locations, travelling a total of 399Km.
The motion-language descriptions provide 310.5K sentences in 8.64M words from a vocabulary size of 6545.
arXiv Detail & Related papers (2024-06-14T10:23:53Z) - ParaHome: Parameterizing Everyday Home Activities Towards 3D Generative Modeling of Human-Object Interactions [10.364340631868322]
We introduce our ParaHome system designed to capture dynamic 3D movements of humans and objects within a common home environment.<n>Our system features a multi-view setup with 70 synchronized RGB cameras, along with wearable motion capture devices including an IMU-based body suit and hand motion capture gloves.<n>By leveraging the ParaHome system, we collect a new human-object interaction dataset, including 486 minutes of sequences across 207 captures with 38 participants.
arXiv Detail & Related papers (2024-01-18T18:59:58Z) - MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in
3D World [55.878173953175356]
We propose MultiPLY, a multisensory embodied large language model.
We first collect Multisensory Universe, a large-scale multisensory interaction dataset comprising 500k data.
We demonstrate that MultiPLY outperforms baselines by a large margin through a diverse set of embodied tasks.
arXiv Detail & Related papers (2024-01-16T18:59:45Z) - Revisit Human-Scene Interaction via Space Occupancy [55.67657438543008]
Human-scene Interaction (HSI) generation is a challenging task and crucial for various downstream tasks.
In this work, we argue that interaction with a scene is essentially interacting with the space occupancy of the scene from an abstract physical perspective.
By treating pure motion sequences as records of humans interacting with invisible scene occupancy, we can aggregate motion-only data into a large-scale paired human-occupancy interaction database.
arXiv Detail & Related papers (2023-12-05T12:03:00Z) - CIRCLE: Capture In Rich Contextual Environments [69.97976304918149]
We propose a novel motion acquisition system in which the actor perceives and operates in a highly contextual virtual world.
We present CIRCLE, a dataset containing 10 hours of full-body reaching motion from 5 subjects across nine scenes.
We use this dataset to train a model that generates human motion conditioned on scene information.
arXiv Detail & Related papers (2023-03-31T09:18:12Z) - BEHAVE: Dataset and Method for Tracking Human Object Interactions [105.77368488612704]
We present the first full body human- object interaction dataset with multi-view RGBD frames and corresponding 3D SMPL and object fits along with the annotated contacts between them.
We use this data to learn a model that can jointly track humans and objects in natural environments with an easy-to-use portable multi-camera setup.
arXiv Detail & Related papers (2022-04-14T13:21:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.