Object Motion Guided Human Motion Synthesis
- URL: http://arxiv.org/abs/2309.16237v1
- Date: Thu, 28 Sep 2023 08:22:00 GMT
- Title: Object Motion Guided Human Motion Synthesis
- Authors: Jiaman Li, Jiajun Wu, C. Karen Liu
- Abstract summary: We study the problem of full-body human motion synthesis for the manipulation of large-sized objects.
We propose Object MOtion guided human MOtion synthesis (OMOMO), a conditional diffusion framework.
We develop a novel system that captures full-body human manipulation motions by simply attaching a smartphone to the object being manipulated.
- Score: 22.08240141115053
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modeling human behaviors in contextual environments has a wide range of
applications in character animation, embodied AI, VR/AR, and robotics. In
real-world scenarios, humans frequently interact with the environment and
manipulate various objects to complete daily tasks. In this work, we study the
problem of full-body human motion synthesis for the manipulation of large-sized
objects. We propose Object MOtion guided human MOtion synthesis (OMOMO), a
conditional diffusion framework that can generate full-body manipulation
behaviors from only the object motion. Since naively applying diffusion models
fails to precisely enforce contact constraints between the hands and the
object, OMOMO learns two separate denoising processes to first predict hand
positions from object motion and subsequently synthesize full-body poses based
on the predicted hand positions. By employing the hand positions as an
intermediate representation between the two denoising processes, we can
explicitly enforce contact constraints, resulting in more physically plausible
manipulation motions. With the learned model, we develop a novel system that
captures full-body human manipulation motions by simply attaching a smartphone
to the object being manipulated. Through extensive experiments, we demonstrate
the effectiveness of our proposed pipeline and its ability to generalize to
unseen objects. Additionally, as high-quality human-object interaction datasets
are scarce, we collect a large-scale dataset consisting of 3D object geometry,
object motion, and human motion. Our dataset contains human-object interaction
motion for 15 objects, with a total duration of approximately 10 hours.
Related papers
- Controlling the World by Sleight of Hand [26.874176292105556]
We learn an action-conditional generative models by learning from unlabeled videos of human hands interacting with objects.
Given an image, and the shape/location of a desired hand interaction, CosHand, synthesizes an image of a future after the interaction has occurred.
Experiments show that the resulting model can predict the effects of hand-object interactions well.
arXiv Detail & Related papers (2024-08-13T18:33:45Z) - Scaling Up Dynamic Human-Scene Interaction Modeling [58.032368564071895]
TRUMANS is the most comprehensive motion-captured HSI dataset currently available.
It intricately captures whole-body human motions and part-level object dynamics.
We devise a diffusion-based autoregressive model that efficiently generates HSI sequences of any length.
arXiv Detail & Related papers (2024-03-13T15:45:04Z) - Controllable Human-Object Interaction Synthesis [77.56877961681462]
We propose Controllable Human-Object Interaction Synthesis (CHOIS) to generate synchronized object motion and human motion in 3D scenes.
Here, language descriptions inform style and intent, and waypoints, which can be effectively extracted from high-level planning, ground the motion in the scene.
Our module seamlessly integrates with a path planning module, enabling the generation of long-term interactions in 3D environments.
arXiv Detail & Related papers (2023-12-06T21:14:20Z) - GRIP: Generating Interaction Poses Using Spatial Cues and Latent Consistency [57.9920824261925]
Hands are dexterous and highly versatile manipulators that are central to how humans interact with objects and their environment.
modeling realistic hand-object interactions is critical for applications in computer graphics, computer vision, and mixed reality.
GRIP is a learning-based method that takes as input the 3D motion of the body and the object, and synthesizes realistic motion for both hands before, during, and after object interaction.
arXiv Detail & Related papers (2023-08-22T17:59:51Z) - NIFTY: Neural Object Interaction Fields for Guided Human Motion
Synthesis [21.650091018774972]
We create a neural interaction field attached to a specific object, which outputs the distance to the valid interaction manifold given a human pose as input.
This interaction field guides the sampling of an object-conditioned human motion diffusion model.
We synthesize realistic motions for sitting and lifting with several objects, outperforming alternative approaches in terms of motion quality and successful action completion.
arXiv Detail & Related papers (2023-07-14T17:59:38Z) - Task-Oriented Human-Object Interactions Generation with Implicit Neural
Representations [61.659439423703155]
TOHO: Task-Oriented Human-Object Interactions Generation with Implicit Neural Representations.
Our method generates continuous motions that are parameterized only by the temporal coordinate.
This work takes a step further toward general human-scene interaction simulation.
arXiv Detail & Related papers (2023-03-23T09:31:56Z) - IMoS: Intent-Driven Full-Body Motion Synthesis for Human-Object
Interactions [69.95820880360345]
We present the first framework to synthesize the full-body motion of virtual human characters with 3D objects placed within their reach.
Our system takes as input textual instructions specifying the objects and the associated intentions of the virtual characters.
We show that our synthesized full-body motions appear more realistic to the participants in more than 80% of scenarios.
arXiv Detail & Related papers (2022-12-14T23:59:24Z) - Learn to Predict How Humans Manipulate Large-sized Objects from
Interactive Motions [82.90906153293585]
We propose a graph neural network, HO-GCN, to fuse motion data and dynamic descriptors for the prediction task.
We show the proposed network that consumes dynamic descriptors can achieve state-of-the-art prediction results and help the network better generalize to unseen objects.
arXiv Detail & Related papers (2022-06-25T09:55:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.