Unifying Human Motion Synthesis and Style Transfer with Denoising
Diffusion Probabilistic Models
- URL: http://arxiv.org/abs/2212.08526v1
- Date: Fri, 16 Dec 2022 15:15:34 GMT
- Title: Unifying Human Motion Synthesis and Style Transfer with Denoising
Diffusion Probabilistic Models
- Authors: Ziyi Chang, Edmund J. C. Findlay, Haozheng Zhang and Hubert P. H. Shum
- Abstract summary: Generating realistic motions for digital humans is a core but challenging part of computer animations and games.
We propose a denoising diffusion model solution for styled motion synthesis.
We design a multi-task architecture of diffusion model that strategically generates aspects of human motions for local guidance.
- Score: 9.789705536694665
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generating realistic motions for digital humans is a core but challenging
part of computer animations and games, as human motions are both diverse in
content and rich in styles. While the latest deep learning approaches have made
significant advancements in this domain, they mostly consider motion synthesis
and style manipulation as two separate problems. This is mainly due to the
challenge of learning both motion contents that account for the inter-class
behaviour and styles that account for the intra-class behaviour effectively in
a common representation. To tackle this challenge, we propose a denoising
diffusion probabilistic model solution for styled motion synthesis. As
diffusion models have a high capacity brought by the injection of
stochasticity, we can represent both inter-class motion content and intra-class
style behaviour in the same latent. This results in an integrated, end-to-end
trained pipeline that facilitates the generation of optimal motion and
exploration of content-style coupled latent space. To achieve high-quality
results, we design a multi-task architecture of diffusion model that
strategically generates aspects of human motions for local guidance. We also
design adversarial and physical regulations for global guidance. We demonstrate
superior performance with quantitative and qualitative results and validate the
effectiveness of our multi-task architecture.
Related papers
- Generation of Complex 3D Human Motion by Temporal and Spatial Composition of Diffusion Models [9.739611757541535]
Our approach involves decomposing complex actions into simpler movements, specifically those observed during training.
These simpler movements are then combined into a single, realistic animation using the properties of diffusion models.
We evaluate our method by dividing two benchmark human motion datasets into basic and complex actions, and then compare its performance against the state-of-the-art.
arXiv Detail & Related papers (2024-09-18T12:32:39Z) - in2IN: Leveraging individual Information to Generate Human INteractions [29.495166514135295]
We introduce in2IN, a novel diffusion model for human-human motion generation conditioned on individual descriptions.
We also propose DualMDM, a model composition technique that combines the motions generated with in2IN and the motions generated by a single-person motion prior pre-trained on HumanML3D.
arXiv Detail & Related papers (2024-04-15T17:59:04Z) - MambaTalk: Efficient Holistic Gesture Synthesis with Selective State Space Models [22.044020889631188]
We introduce MambaTalk, enhancing gesture diversity and rhythm through multimodal integration.
Our method matches or exceeds the performance of state-of-the-art models.
arXiv Detail & Related papers (2024-03-14T15:10:54Z) - DiverseMotion: Towards Diverse Human Motion Generation via Discrete
Diffusion [70.33381660741861]
We present DiverseMotion, a new approach for synthesizing high-quality human motions conditioned on textual descriptions.
We show that our DiverseMotion achieves the state-of-the-art motion quality and competitive motion diversity.
arXiv Detail & Related papers (2023-09-04T05:43:48Z) - Priority-Centric Human Motion Generation in Discrete Latent Space [59.401128190423535]
We introduce a Priority-Centric Motion Discrete Diffusion Model (M2DM) for text-to-motion generation.
M2DM incorporates a global self-attention mechanism and a regularization term to counteract code collapse.
We also present a motion discrete diffusion model that employs an innovative noise schedule, determined by the significance of each motion token.
arXiv Detail & Related papers (2023-08-28T10:40:16Z) - Persistent-Transient Duality: A Multi-mechanism Approach for Modeling
Human-Object Interaction [58.67761673662716]
Humans are highly adaptable, swiftly switching between different modes to handle different tasks, situations and contexts.
In Human-object interaction (HOI) activities, these modes can be attributed to two mechanisms: (1) the large-scale consistent plan for the whole activity and (2) the small-scale children interactive actions that start and end along the timeline.
This work proposes to model two concurrent mechanisms that jointly control human motion.
arXiv Detail & Related papers (2023-07-24T12:21:33Z) - Executing your Commands via Motion Diffusion in Latent Space [51.64652463205012]
We propose a Motion Latent-based Diffusion model (MLD) to produce vivid motion sequences conforming to the given conditional inputs.
Our MLD achieves significant improvements over the state-of-the-art methods among extensive human motion generation tasks.
arXiv Detail & Related papers (2022-12-08T03:07:00Z) - Denoising Diffusion Probabilistic Models for Styled Walking Synthesis [9.789705536694665]
We propose a framework using the denoising diffusion probabilistic model (DDPM) to synthesize styled human motions.
Experimental results show that our system can generate high-quality and diverse walking motions.
arXiv Detail & Related papers (2022-09-29T14:45:33Z) - MoDi: Unconditional Motion Synthesis from Diverse Data [51.676055380546494]
We present MoDi, an unconditional generative model that synthesizes diverse motions.
Our model is trained in a completely unsupervised setting from a diverse, unstructured and unlabeled motion dataset.
We show that despite the lack of any structure in the dataset, the latent space can be semantically clustered.
arXiv Detail & Related papers (2022-06-16T09:06:25Z) - Towards Diverse and Natural Scene-aware 3D Human Motion Synthesis [117.15586710830489]
We focus on the problem of synthesizing diverse scene-aware human motions under the guidance of target action sequences.
Based on this factorized scheme, a hierarchical framework is proposed, with each sub-module responsible for modeling one aspect.
Experiment results show that the proposed framework remarkably outperforms previous methods in terms of diversity and naturalness.
arXiv Detail & Related papers (2022-05-25T18:20:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.