A Unified Framework for Multimodal, Multi-Part Human Motion Synthesis
- URL: http://arxiv.org/abs/2311.16471v1
- Date: Tue, 28 Nov 2023 04:13:49 GMT
- Title: A Unified Framework for Multimodal, Multi-Part Human Motion Synthesis
- Authors: Zixiang Zhou, Yu Wan, Baoyuan Wang
- Abstract summary: We introduce a cohesive and scalable approach that consolidates multimodal (text, music, speech) and multi-part (hand, torso) human motion generation.
Our method frames the multimodal motion generation challenge as a token prediction task, drawing from specialized codebooks based on the modality of the control signal.
- Score: 17.45562922442149
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The field has made significant progress in synthesizing realistic human
motion driven by various modalities. Yet, the need for different methods to
animate various body parts according to different control signals limits the
scalability of these techniques in practical scenarios. In this paper, we
introduce a cohesive and scalable approach that consolidates multimodal (text,
music, speech) and multi-part (hand, torso) human motion generation. Our
methodology unfolds in several steps: We begin by quantizing the motions of
diverse body parts into separate codebooks tailored to their respective
domains. Next, we harness the robust capabilities of pre-trained models to
transcode multimodal signals into a shared latent space. We then translate
these signals into discrete motion tokens by iteratively predicting subsequent
tokens to form a complete sequence. Finally, we reconstruct the continuous
actual motion from this tokenized sequence. Our method frames the multimodal
motion generation challenge as a token prediction task, drawing from
specialized codebooks based on the modality of the control signal. This
approach is inherently scalable, allowing for the easy integration of new
modalities. Extensive experiments demonstrated the effectiveness of our design,
emphasizing its potential for broad application.
Related papers
- Multi-Resolution Generative Modeling of Human Motion from Limited Data [3.5229503563299915]
We present a generative model that learns to synthesize human motion from limited training sequences.
The model adeptly captures human motion patterns by integrating skeletal convolution layers and a multi-scale architecture.
arXiv Detail & Related papers (2024-11-25T15:36:29Z) - Dynamic Motion Synthesis: Masked Audio-Text Conditioned Spatio-Temporal Transformers [13.665279127648658]
This research presents a novel motion generation framework designed to produce whole-body motion sequences conditioned on multiple modalities simultaneously.
By integrating spatial attention mechanisms and a token critic we ensure consistency and naturalness in the generated motions.
arXiv Detail & Related papers (2024-09-03T04:19:27Z) - FreeMotion: A Unified Framework for Number-free Text-to-Motion Synthesis [65.85686550683806]
This paper reconsiders motion generation and proposes to unify the single and multi-person motion by the conditional motion distribution.
Based on our framework, the current single-person motion spatial control method could be seamlessly integrated, achieving precise control of multi-person motion.
arXiv Detail & Related papers (2024-05-24T17:57:57Z) - MambaTalk: Efficient Holistic Gesture Synthesis with Selective State Space Models [22.044020889631188]
We introduce MambaTalk, enhancing gesture diversity and rhythm through multimodal integration.
Our method matches or exceeds the performance of state-of-the-art models.
arXiv Detail & Related papers (2024-03-14T15:10:54Z) - InterControl: Zero-shot Human Interaction Generation by Controlling Every Joint [67.6297384588837]
We introduce a novel controllable motion generation method, InterControl, to encourage the synthesized motions maintaining the desired distance between joint pairs.
We demonstrate that the distance between joint pairs for human-wise interactions can be generated using an off-the-shelf Large Language Model.
arXiv Detail & Related papers (2023-11-27T14:32:33Z) - DiverseMotion: Towards Diverse Human Motion Generation via Discrete
Diffusion [70.33381660741861]
We present DiverseMotion, a new approach for synthesizing high-quality human motions conditioned on textual descriptions.
We show that our DiverseMotion achieves the state-of-the-art motion quality and competitive motion diversity.
arXiv Detail & Related papers (2023-09-04T05:43:48Z) - Executing your Commands via Motion Diffusion in Latent Space [51.64652463205012]
We propose a Motion Latent-based Diffusion model (MLD) to produce vivid motion sequences conforming to the given conditional inputs.
Our MLD achieves significant improvements over the state-of-the-art methods among extensive human motion generation tasks.
arXiv Detail & Related papers (2022-12-08T03:07:00Z) - MoDi: Unconditional Motion Synthesis from Diverse Data [51.676055380546494]
We present MoDi, an unconditional generative model that synthesizes diverse motions.
Our model is trained in a completely unsupervised setting from a diverse, unstructured and unlabeled motion dataset.
We show that despite the lack of any structure in the dataset, the latent space can be semantically clustered.
arXiv Detail & Related papers (2022-06-16T09:06:25Z) - Recurrent Transformer Variational Autoencoders for Multi-Action Motion
Synthesis [17.15415641710113]
We consider the problem of synthesizing multi-action human motion sequences of arbitrary lengths.
Existing approaches have mastered motion sequence generation in single-action scenarios, but fail to generalize to multi-action and arbitrary-length sequences.
We propose a novel efficient approach that leverages the richness of Recurrent Transformers and generative richness of conditional Variational Autoencoders.
arXiv Detail & Related papers (2022-06-14T10:40:16Z) - Towards Diverse and Natural Scene-aware 3D Human Motion Synthesis [117.15586710830489]
We focus on the problem of synthesizing diverse scene-aware human motions under the guidance of target action sequences.
Based on this factorized scheme, a hierarchical framework is proposed, with each sub-module responsible for modeling one aspect.
Experiment results show that the proposed framework remarkably outperforms previous methods in terms of diversity and naturalness.
arXiv Detail & Related papers (2022-05-25T18:20:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.