ParaHome: Parameterizing Everyday Home Activities Towards 3D Generative
Modeling of Human-Object Interactions
- URL: http://arxiv.org/abs/2401.10232v1
- Date: Thu, 18 Jan 2024 18:59:58 GMT
- Title: ParaHome: Parameterizing Everyday Home Activities Towards 3D Generative
Modeling of Human-Object Interactions
- Authors: Jeonghwan Kim, Jisoo Kim, Jeonghyeon Na, Hanbyul Joo
- Abstract summary: We introduce the ParaHome system, designed to capture dynamic 3D movements of humans and objects within a common home environment.
By leveraging the ParaHome system, we collect a novel large-scale dataset of human-object interaction.
- Score: 11.32229757116179
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To enable machines to learn how humans interact with the physical world in
our daily activities, it is crucial to provide rich data that encompasses the
3D motion of humans as well as the motion of objects in a learnable 3D
representation. Ideally, this data should be collected in a natural setup,
capturing the authentic dynamic 3D signals during human-object interactions. To
address this challenge, we introduce the ParaHome system, designed to capture
and parameterize dynamic 3D movements of humans and objects within a common
home environment. Our system consists of a multi-view setup with 70
synchronized RGB cameras, as well as wearable motion capture devices equipped
with an IMU-based body suit and hand motion capture gloves. By leveraging the
ParaHome system, we collect a novel large-scale dataset of human-object
interaction. Notably, our dataset offers key advancement over existing datasets
in three main aspects: (1) capturing 3D body and dexterous hand manipulation
motion alongside 3D object movement within a contextual home environment during
natural activities; (2) encompassing human interaction with multiple objects in
various episodic scenarios with corresponding descriptions in texts; (3)
including articulated objects with multiple parts expressed with parameterized
articulations. Building upon our dataset, we introduce new research tasks aimed
at building a generative model for learning and synthesizing human-object
interactions in a real-world room setting.
Related papers
- EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting [95.44545809256473]
EgoGaussian is a method capable of simultaneously reconstructing 3D scenes and dynamically tracking 3D object motion from RGB egocentric input alone.
We show significant improvements in terms of both dynamic object and background reconstruction quality compared to the state-of-the-art.
arXiv Detail & Related papers (2024-06-28T10:39:36Z) - Object Motion Guided Human Motion Synthesis [22.08240141115053]
We study the problem of full-body human motion synthesis for the manipulation of large-sized objects.
We propose Object MOtion guided human MOtion synthesis (OMOMO), a conditional diffusion framework.
We develop a novel system that captures full-body human manipulation motions by simply attaching a smartphone to the object being manipulated.
arXiv Detail & Related papers (2023-09-28T08:22:00Z) - GRIP: Generating Interaction Poses Using Spatial Cues and Latent Consistency [57.9920824261925]
Hands are dexterous and highly versatile manipulators that are central to how humans interact with objects and their environment.
modeling realistic hand-object interactions is critical for applications in computer graphics, computer vision, and mixed reality.
GRIP is a learning-based method that takes as input the 3D motion of the body and the object, and synthesizes realistic motion for both hands before, during, and after object interaction.
arXiv Detail & Related papers (2023-08-22T17:59:51Z) - Full-Body Articulated Human-Object Interaction [61.01135739641217]
CHAIRS is a large-scale motion-captured f-AHOI dataset consisting of 16.2 hours of versatile interactions.
CHAIRS provides 3D meshes of both humans and articulated objects during the entire interactive process.
By learning the geometrical relationships in HOI, we devise the very first model that leverage human pose estimation.
arXiv Detail & Related papers (2022-12-20T19:50:54Z) - HUMANISE: Language-conditioned Human Motion Generation in 3D Scenes [54.61610144668777]
We present a novel scene-and-language conditioned generative model that can produce 3D human motions in 3D scenes.
Our experiments demonstrate that our model generates diverse and semantically consistent human motions in 3D scenes.
arXiv Detail & Related papers (2022-10-18T10:14:11Z) - Reconstructing Action-Conditioned Human-Object Interactions Using
Commonsense Knowledge Priors [42.17542596399014]
We present a method for inferring diverse 3D models of human-object interactions from images.
Our method extracts high-level commonsense knowledge from large language models.
We quantitatively evaluate the inferred 3D models on a large human-object interaction dataset.
arXiv Detail & Related papers (2022-09-06T13:32:55Z) - Compositional Human-Scene Interaction Synthesis with Semantic Control [16.93177243590465]
We aim to synthesize humans interacting with a given 3D scene controlled by high-level semantic specifications.
We design a novel transformer-based generative model, in which the articulated 3D human body surface points and 3D objects are jointly encoded.
Inspired by the compositional nature of interactions that humans can simultaneously interact with multiple objects, we define interaction semantics as the composition of varying numbers of atomic action-object pairs.
arXiv Detail & Related papers (2022-07-26T11:37:44Z) - BEHAVE: Dataset and Method for Tracking Human Object Interactions [105.77368488612704]
We present the first full body human- object interaction dataset with multi-view RGBD frames and corresponding 3D SMPL and object fits along with the annotated contacts between them.
We use this data to learn a model that can jointly track humans and objects in natural environments with an easy-to-use portable multi-camera setup.
arXiv Detail & Related papers (2022-04-14T13:21:19Z) - Estimating 3D Motion and Forces of Human-Object Interactions from
Internet Videos [49.52070710518688]
We introduce a method to reconstruct the 3D motion of a person interacting with an object from a single RGB video.
Our method estimates the 3D poses of the person together with the object pose, the contact positions and the contact forces on the human body.
arXiv Detail & Related papers (2021-11-02T13:40:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.