HandDiffuse: Generative Controllers for Two-Hand Interactions via
Diffusion Models
- URL: http://arxiv.org/abs/2312.04867v1
- Date: Fri, 8 Dec 2023 07:07:13 GMT
- Title: HandDiffuse: Generative Controllers for Two-Hand Interactions via
Diffusion Models
- Authors: Pei Lin, Sihang Xu, Hongdi Yang, Yiran Liu, Xin Chen, Jingya Wang,
Jingyi Yu, Lan Xu
- Abstract summary: Existing hands datasets are largely short-range and the interaction is weak due to the self-occlusion and self-similarity of hands.
To rescue the data scarcity, we propose HandDiffuse12.5M, a novel dataset that consists of temporal sequences with strong two-hand interactions.
- Score: 48.56319454887096
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing hands datasets are largely short-range and the interaction is weak
due to the self-occlusion and self-similarity of hands, which can not yet fit
the need for interacting hands motion generation. To rescue the data scarcity,
we propose HandDiffuse12.5M, a novel dataset that consists of temporal
sequences with strong two-hand interactions. HandDiffuse12.5M has the largest
scale and richest interactions among the existing two-hand datasets. We further
present a strong baseline method HandDiffuse for the controllable motion
generation of interacting hands using various controllers. Specifically, we
apply the diffusion model as the backbone and design two motion representations
for different controllers. To reduce artifacts, we also propose Interaction
Loss which explicitly quantifies the dynamic interaction process. Our
HandDiffuse enables various applications with vivid two-hand interactions,
i.e., motion in-betweening and trajectory control. Experiments show that our
method outperforms the state-of-the-art techniques in motion generation and can
also contribute to data augmentation for other datasets. Our dataset,
corresponding codes, and pre-trained models will be disseminated to the
community for future research towards two-hand interaction modeling.
Related papers
- Dynamic Reconstruction of Hand-Object Interaction with Distributed Force-aware Contact Representation [52.36691633451968]
ViTaM-D is a visual-tactile framework for dynamic hand-object interaction reconstruction.
DF-Field is a distributed force-aware contact representation model.
Our results highlight the superior performance of ViTaM-D in both rigid and deformable object reconstruction.
arXiv Detail & Related papers (2024-11-14T16:29:45Z) - DiffH2O: Diffusion-Based Synthesis of Hand-Object Interactions from Textual Descriptions [15.417836855005087]
We propose DiffH2O, a novel method to synthesize realistic, one or two-handed object interactions.
We decompose the task into a grasping stage and a text-based interaction stage.
In the grasping stage, the model only generates hand motions, whereas in the interaction phase both hand and object poses are synthesized.
arXiv Detail & Related papers (2024-03-26T16:06:42Z) - Gaze-guided Hand-Object Interaction Synthesis: Dataset and Method [63.49140028965778]
We present GazeHOI, the first dataset to capture simultaneous 3D modeling of gaze, hand, and object interactions.
To tackle these issues, we propose a stacked gaze-guided hand-object interaction diffusion model, named GHO-Diffusion.
We also introduce HOI-Manifold Guidance during the sampling stage of GHO-Diffusion, enabling fine-grained control over generated motions.
arXiv Detail & Related papers (2024-03-24T14:24:13Z) - Learning Mutual Excitation for Hand-to-Hand and Human-to-Human
Interaction Recognition [22.538114033191313]
We propose a mutual excitation graph convolutional network (me-GCN) by stacking mutual excitation graph convolution layers.
Me-GC learns mutual information in each layer and each stage of graph convolution operations.
Our proposed me-GC outperforms state-of-the-art GCN-based and Transformer-based methods.
arXiv Detail & Related papers (2024-02-04T10:00:00Z) - BOTH2Hands: Inferring 3D Hands from Both Text Prompts and Body Dynamics [50.88842027976421]
We propose BOTH57M, a novel multi-modal dataset for two-hand motion generation.
Our dataset includes accurate motion tracking for the human body and hands.
We also provide a strong baseline method, BOTH2Hands, for the novel task.
arXiv Detail & Related papers (2023-12-13T07:30:19Z) - InterControl: Zero-shot Human Interaction Generation by Controlling Every Joint [67.6297384588837]
We introduce a novel controllable motion generation method, InterControl, to encourage the synthesized motions maintaining the desired distance between joint pairs.
We demonstrate that the distance between joint pairs for human-wise interactions can be generated using an off-the-shelf Large Language Model.
arXiv Detail & Related papers (2023-11-27T14:32:33Z) - InterGen: Diffusion-based Multi-human Motion Generation under Complex Interactions [49.097973114627344]
We present InterGen, an effective diffusion-based approach that incorporates human-to-human interactions into the motion diffusion process.
We first contribute a multimodal dataset, named InterHuman. It consists of about 107M frames for diverse two-person interactions, with accurate skeletal motions and 23,337 natural language descriptions.
We propose a novel representation for motion input in our interaction diffusion model, which explicitly formulates the global relations between the two performers in the world frame.
arXiv Detail & Related papers (2023-04-12T08:12:29Z) - Controllable Motion Synthesis and Reconstruction with Autoregressive
Diffusion Models [18.50942770933098]
MoDiff is an autoregressive probabilistic diffusion model over motion sequences conditioned on control contexts of other modalities.
Our model integrates a cross-modal Transformer encoder and a Transformer-based decoder, which are found effective in capturing temporal correlations in motion and control modalities.
arXiv Detail & Related papers (2023-04-03T08:17:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.