Pretrained Diffusion Models for Unified Human Motion Synthesis
- URL: http://arxiv.org/abs/2212.02837v1
- Date: Tue, 6 Dec 2022 09:19:21 GMT
- Title: Pretrained Diffusion Models for Unified Human Motion Synthesis
- Authors: Jianxin Ma, Shuai Bai, Chang Zhou
- Abstract summary: MoFusion is a framework for unified motion synthesis.
It employs a Transformer backbone to ease the inclusion of diverse control signals.
It also supports multi-granularity synthesis ranging from motion completion of a body part to whole-body motion generation.
- Score: 33.41816844381057
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative modeling of human motion has broad applications in computer
animation, virtual reality, and robotics. Conventional approaches develop
separate models for different motion synthesis tasks, and typically use a model
of a small size to avoid overfitting the scarce data available in each setting.
It remains an open question whether developing a single unified model is
feasible, which may 1) benefit the acquirement of novel skills by combining
skills learned from multiple tasks, and 2) help in increasing the model
capacity without overfitting by combining multiple data sources. Unification is
challenging because 1) it involves diverse control signals as well as targets
of varying granularity, and 2) motion datasets may use different skeletons and
default poses. In this paper, we present MoFusion, a framework for unified
motion synthesis. MoFusion employs a Transformer backbone to ease the inclusion
of diverse control signals via cross attention, and pretrains the backbone as a
diffusion model to support multi-granularity synthesis ranging from motion
completion of a body part to whole-body motion generation. It uses a learnable
adapter to accommodate the differences between the default skeletons used by
the pretraining and the fine-tuning data. Empirical results show that
pretraining is vital for scaling the model size without overfitting, and
demonstrate MoFusion's potential in various tasks, e.g., text-to-motion, motion
completion, and zero-shot mixing of multiple control signals. Project page:
\url{https://ofa-sys.github.io/MoFusion/}.
Related papers
- Taming Diffusion Probabilistic Models for Character Control [46.52584236101806]
We present a novel character control framework that responds in real-time to a variety of user-supplied control signals.
At the heart of our method lies a transformer-based Conditional Autoregressive Motion Diffusion Model.
Our work represents the first model that enables real-time generation of high-quality, diverse character animations.
arXiv Detail & Related papers (2024-04-23T15:20:17Z) - Large Motion Model for Unified Multi-Modal Motion Generation [50.56268006354396]
Large Motion Model (LMM) is a motion-centric, multi-modal framework that unifies mainstream motion generation tasks into a generalist model.
LMM tackles these challenges from three principled aspects.
arXiv Detail & Related papers (2024-04-01T17:55:11Z) - InterControl: Zero-shot Human Interaction Generation by Controlling Every Joint [67.6297384588837]
We introduce a novel controllable motion generation method, InterControl, to encourage the synthesized motions maintaining the desired distance between joint pairs.
We demonstrate that the distance between joint pairs for human-wise interactions can be generated using an off-the-shelf Large Language Model.
arXiv Detail & Related papers (2023-11-27T14:32:33Z) - TapMo: Shape-aware Motion Generation of Skeleton-free Characters [64.83230289993145]
We present TapMo, a Text-driven Animation Pipeline for Motion in a broad spectrum of skeleton-free 3D characters.
TapMo comprises two main components - Mesh Handle Predictor and Shape-aware Diffusion Module.
arXiv Detail & Related papers (2023-10-19T12:14:32Z) - UnIVAL: Unified Model for Image, Video, Audio and Language Tasks [105.77733287326308]
UnIVAL model goes beyond two modalities and unifies text, images, video, and audio into a single model.
Our model is efficiently pretrained on many tasks, based on task balancing and multimodal curriculum learning.
Thanks to the unified model, we propose a novel study on multimodal model merging via weight generalization.
arXiv Detail & Related papers (2023-07-30T09:48:36Z) - Controllable Motion Synthesis and Reconstruction with Autoregressive
Diffusion Models [18.50942770933098]
MoDiff is an autoregressive probabilistic diffusion model over motion sequences conditioned on control contexts of other modalities.
Our model integrates a cross-modal Transformer encoder and a Transformer-based decoder, which are found effective in capturing temporal correlations in motion and control modalities.
arXiv Detail & Related papers (2023-04-03T08:17:08Z) - MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model [35.32967411186489]
MotionDiffuse is a diffusion model-based text-driven motion generation framework.
It excels at modeling complicated data distribution and generating vivid motion sequences.
It responds to fine-grained instructions on body parts, and arbitrary-length motion synthesis with time-varied text prompts.
arXiv Detail & Related papers (2022-08-31T17:58:54Z) - MoDi: Unconditional Motion Synthesis from Diverse Data [51.676055380546494]
We present MoDi, an unconditional generative model that synthesizes diverse motions.
Our model is trained in a completely unsupervised setting from a diverse, unstructured and unlabeled motion dataset.
We show that despite the lack of any structure in the dataset, the latent space can be semantically clustered.
arXiv Detail & Related papers (2022-06-16T09:06:25Z) - Modeling Shared Responses in Neuroimaging Studies through MultiView ICA [94.31804763196116]
Group studies involving large cohorts of subjects are important to draw general conclusions about brain functional organization.
We propose a novel MultiView Independent Component Analysis model for group studies, where data from each subject are modeled as a linear combination of shared independent sources plus noise.
We demonstrate the usefulness of our approach first on fMRI data, where our model demonstrates improved sensitivity in identifying common sources among subjects.
arXiv Detail & Related papers (2020-06-11T17:29:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.