Diffgrasp: Whole-Body Grasping Synthesis Guided by Object Motion Using a Diffusion Model
- URL: http://arxiv.org/abs/2412.20657v1
- Date: Mon, 30 Dec 2024 02:21:43 GMT
- Title: Diffgrasp: Whole-Body Grasping Synthesis Guided by Object Motion Using a Diffusion Model
- Authors: Yonghao Zhang, Qiang He, Yanguang Wan, Yinda Zhang, Xiaoming Deng, Cuixia Ma, Hongan Wang,
- Abstract summary: We propose a simple yet effective framework that jointly models the relationship between the body, hands, and the given object motion sequences.
We introduce novel contact-aware losses and incorporate a data-driven, carefully designed guidance.
Experimental results demonstrate that our approach outperforms the state-of-the-art method and generates plausible whole-body motion sequences.
- Score: 25.00532805042292
- License:
- Abstract: Generating high-quality whole-body human object interaction motion sequences is becoming increasingly important in various fields such as animation, VR/AR, and robotics. The main challenge of this task lies in determining the level of involvement of each hand given the complex shapes of objects in different sizes and their different motion trajectories, while ensuring strong grasping realism and guaranteeing the coordination of movement in all body parts. Contrasting with existing work, which either generates human interaction motion sequences without detailed hand grasping poses or only models a static grasping pose, we propose a simple yet effective framework that jointly models the relationship between the body, hands, and the given object motion sequences within a single diffusion model. To guide our network in perceiving the object's spatial position and learning more natural grasping poses, we introduce novel contact-aware losses and incorporate a data-driven, carefully designed guidance. Experimental results demonstrate that our approach outperforms the state-of-the-art method and generates plausible whole-body motion sequences.
Related papers
- BimArt: A Unified Approach for the Synthesis of 3D Bimanual Interaction with Articulated Objects [70.20706475051347]
BimArt is a novel generative approach for synthesizing 3D bimanual hand interactions with articulated objects.
We first generate distance-based contact maps conditioned on the object trajectory with an articulation-aware feature representation.
The learned contact prior is then used to guide our hand motion generator, producing diverse and realistic bimanual motions for object movement and articulation.
arXiv Detail & Related papers (2024-12-06T14:23:56Z) - THOR: Text to Human-Object Interaction Diffusion via Relation Intervention [51.02435289160616]
We propose a novel Text-guided Human-Object Interaction diffusion model with Relation Intervention (THOR)
In each diffusion step, we initiate text-guided human and object motion and then leverage human-object relations to intervene in object motion.
We construct Text-BEHAVE, a Text2HOI dataset that seamlessly integrates textual descriptions with the currently largest publicly available 3D HOI dataset.
arXiv Detail & Related papers (2024-03-17T13:17:25Z) - Scaling Up Dynamic Human-Scene Interaction Modeling [58.032368564071895]
TRUMANS is the most comprehensive motion-captured HSI dataset currently available.
It intricately captures whole-body human motions and part-level object dynamics.
We devise a diffusion-based autoregressive model that efficiently generates HSI sequences of any length.
arXiv Detail & Related papers (2024-03-13T15:45:04Z) - Hand-Centric Motion Refinement for 3D Hand-Object Interaction via
Hierarchical Spatial-Temporal Modeling [18.128376292350836]
We propose a data-driven method for coarse hand motion refinement.
First, we design a hand-centric representation to describe the dynamic spatial-temporal relation between hands and objects.
Second, to capture the dynamic clues of hand-object interaction, we propose a new architecture.
arXiv Detail & Related papers (2024-01-29T09:17:51Z) - Object Motion Guided Human Motion Synthesis [22.08240141115053]
We study the problem of full-body human motion synthesis for the manipulation of large-sized objects.
We propose Object MOtion guided human MOtion synthesis (OMOMO), a conditional diffusion framework.
We develop a novel system that captures full-body human manipulation motions by simply attaching a smartphone to the object being manipulated.
arXiv Detail & Related papers (2023-09-28T08:22:00Z) - GRIP: Generating Interaction Poses Using Spatial Cues and Latent Consistency [57.9920824261925]
Hands are dexterous and highly versatile manipulators that are central to how humans interact with objects and their environment.
modeling realistic hand-object interactions is critical for applications in computer graphics, computer vision, and mixed reality.
GRIP is a learning-based method that takes as input the 3D motion of the body and the object, and synthesizes realistic motion for both hands before, during, and after object interaction.
arXiv Detail & Related papers (2023-08-22T17:59:51Z) - NIFTY: Neural Object Interaction Fields for Guided Human Motion
Synthesis [21.650091018774972]
We create a neural interaction field attached to a specific object, which outputs the distance to the valid interaction manifold given a human pose as input.
This interaction field guides the sampling of an object-conditioned human motion diffusion model.
We synthesize realistic motions for sitting and lifting with several objects, outperforming alternative approaches in terms of motion quality and successful action completion.
arXiv Detail & Related papers (2023-07-14T17:59:38Z) - Task-Oriented Human-Object Interactions Generation with Implicit Neural
Representations [61.659439423703155]
TOHO: Task-Oriented Human-Object Interactions Generation with Implicit Neural Representations.
Our method generates continuous motions that are parameterized only by the temporal coordinate.
This work takes a step further toward general human-scene interaction simulation.
arXiv Detail & Related papers (2023-03-23T09:31:56Z) - IMoS: Intent-Driven Full-Body Motion Synthesis for Human-Object
Interactions [69.95820880360345]
We present the first framework to synthesize the full-body motion of virtual human characters with 3D objects placed within their reach.
Our system takes as input textual instructions specifying the objects and the associated intentions of the virtual characters.
We show that our synthesized full-body motions appear more realistic to the participants in more than 80% of scenarios.
arXiv Detail & Related papers (2022-12-14T23:59:24Z) - Task-Generic Hierarchical Human Motion Prior using VAEs [44.356707509079044]
A deep generative model that describes human motions can benefit a wide range of fundamental computer vision and graphics tasks.
We present a method for learning complex human motions independent of specific tasks using a combined global and local latent space.
We demonstrate the effectiveness of our hierarchical motion variational autoencoder in a variety of tasks including video-based human pose estimation.
arXiv Detail & Related papers (2021-06-07T23:11:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.