Related papers: X-MoGen: Unified Motion Generation across Humans and Animals

X-MoGen: Unified Motion Generation across Humans and Animals

URL: http://arxiv.org/abs/2508.05162v1
Date: Thu, 07 Aug 2025 08:51:51 GMT
Title: X-MoGen: Unified Motion Generation across Humans and Animals
Authors: Xuan Wang, Kai Ruan, Liyang Qian, Zhizhi Guo, Chang Su, Gaoang Wang,
Abstract summary: X-MoGen is the first unified framework for cross-species text-driven motion generation covering both humans and animals.<n>We construct textbfUniMo4D, a large-scale dataset of 115 species and 119k motion sequences, which integrates human and animal motions under a shared skeletal topology for joint training.<n>Experiments on UniMo4D demonstrate that X-MoGen outperforms state-of-the-art methods on both seen and unseen species.
Score: 9.967329240441844
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text-driven motion generation has attracted increasing attention due to its broad applications in virtual reality, animation, and robotics. While existing methods typically model human and animal motion separately, a joint cross-species approach offers key advantages, such as a unified representation and improved generalization. However, morphological differences across species remain a key challenge, often compromising motion plausibility. To address this, we propose \textbf{X-MoGen}, the first unified framework for cross-species text-driven motion generation covering both humans and animals. X-MoGen adopts a two-stage architecture. First, a conditional graph variational autoencoder learns canonical T-pose priors, while an autoencoder encodes motion into a shared latent space regularized by morphological loss. In the second stage, we perform masked motion modeling to generate motion embeddings conditioned on textual descriptions. During training, a morphological consistency module is employed to promote skeletal plausibility across species. To support unified modeling, we construct \textbf{UniMo4D}, a large-scale dataset of 115 species and 119k motion sequences, which integrates human and animal motions under a shared skeletal topology for joint training. Extensive experiments on UniMo4D demonstrate that X-MoGen outperforms state-of-the-art methods on both seen and unseen species.

Related papers

Behave Your Motion: Habit-preserved Cross-category Animal Motion Transfer [13.123185551606143]
Animal motion embodies species-specific behavioral habits, making the transfer of motion across categories a critical yet complex task for applications in animation and virtual reality.<n>We propose a novel habit-preserved motion transfer framework for cross- animal motion.<n>We introduce the DeformingThings4D-skl dataset, a quadruped dataset with skeletal bindings, and conduct extensive experiments and quantitative analyses.
arXiv Detail & Related papers (2025-07-10T03:25:50Z)
SynMotion: Semantic-Visual Adaptation for Motion Customized Video Generation [56.90807453045657]
SynMotion is a motion-customized video generation model that jointly leverages semantic guidance and visual adaptation.<n>At the semantic level, we introduce the dual-em semantic comprehension mechanism which disentangles subject and motion representations.<n>At the visual level, we integrate efficient motion adapters into a pre-trained video generation model to enhance motion fidelity and temporal coherence.
arXiv Detail & Related papers (2025-06-30T10:09:32Z)
UniMoGen: Universal Motion Generation [1.7749928168018234]
We introduce UniMoGen, a novel UNet-based diffusion model designed for skeleton-agnostic motion generation.<n>UniMoGen can be trained on motion data from diverse characters, without the need for a predefined maximum number of joints.<n>Key features of UniMoGen include controllability via style and trajectory inputs, and the ability to continue motions from past frames.
arXiv Detail & Related papers (2025-05-28T00:03:39Z)
GENMO: A GENeralist Model for Human MOtion [64.16188966024542]
We present GENMO, a unified Generalist Model for Human Motion that bridges motion estimation and generation in a single framework.<n>Our key insight is to reformulate motion estimation as constrained motion generation, where the output motion must precisely satisfy observed conditioning signals.<n>Our novel architecture handles variable-length motions and mixed multimodal conditions (text, audio, video) at different time intervals, offering flexible control.
arXiv Detail & Related papers (2025-05-02T17:59:55Z)
How to Move Your Dragon: Text-to-Motion Synthesis for Large-Vocabulary Objects [37.10752536568922]
Motion synthesis for diverse object categories holds great potential for 3D content creation.<n>We address the lack of comprehensive motion datasets that include a wide range of high-quality motions and annotations.<n>We introduce rig augmentation techniques that generate diverse motion data while preserving consistent dynamics.
arXiv Detail & Related papers (2025-03-06T09:39:09Z)
OmniMotionGPT: Animal Motion Generation with Limited Data [70.35662376853163]
We introduce AnimalML3D, the first text-animal motion dataset with 1240 animation sequences spanning 36 different animal identities. We are able to generate animal motions with high diversity and fidelity, quantitatively and qualitatively outperforming the results of training human motion generation baselines on animal data.
arXiv Detail & Related papers (2023-11-30T07:14:00Z)
Priority-Centric Human Motion Generation in Discrete Latent Space [59.401128190423535]
We introduce a Priority-Centric Motion Discrete Diffusion Model (M2DM) for text-to-motion generation. M2DM incorporates a global self-attention mechanism and a regularization term to counteract code collapse. We also present a motion discrete diffusion model that employs an innovative noise schedule, determined by the significance of each motion token.
arXiv Detail & Related papers (2023-08-28T10:40:16Z)
Executing your Commands via Motion Diffusion in Latent Space [51.64652463205012]
We propose a Motion Latent-based Diffusion model (MLD) to produce vivid motion sequences conforming to the given conditional inputs. Our MLD achieves significant improvements over the state-of-the-art methods among extensive human motion generation tasks.
arXiv Detail & Related papers (2022-12-08T03:07:00Z)
MoDi: Unconditional Motion Synthesis from Diverse Data [51.676055380546494]
We present MoDi, an unconditional generative model that synthesizes diverse motions. Our model is trained in a completely unsupervised setting from a diverse, unstructured and unlabeled motion dataset. We show that despite the lack of any structure in the dataset, the latent space can be semantically clustered.
arXiv Detail & Related papers (2022-06-16T09:06:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.