MCM: Multi-condition Motion Synthesis Framework
- URL: http://arxiv.org/abs/2404.12886v1
- Date: Fri, 19 Apr 2024 13:40:25 GMT
- Title: MCM: Multi-condition Motion Synthesis Framework
- Authors: Zeyu Ling, Bo Han, Yongkang Wongkan, Han Lin, Mohan Kankanhalli, Weidong Geng,
- Abstract summary: Conditional human motion synthesis (HMS) aims to generate human motion sequences that conform to specific conditions.
We propose a multi-condition HMS framework, termed MCM, based on a dual-branch structure composed of a main branch and a control branch.
- Score: 15.726843047963664
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conditional human motion synthesis (HMS) aims to generate human motion sequences that conform to specific conditions. Text and audio represent the two predominant modalities employed as HMS control conditions. While existing research has primarily focused on single conditions, the multi-condition human motion synthesis remains underexplored. In this study, we propose a multi-condition HMS framework, termed MCM, based on a dual-branch structure composed of a main branch and a control branch. This framework effectively extends the applicability of the diffusion model, which is initially predicated solely on textual conditions, to auditory conditions. This extension encompasses both music-to-dance and co-speech HMS while preserving the intrinsic quality of motion and the capabilities for semantic association inherent in the original model. Furthermore, we propose the implementation of a Transformer-based diffusion model, designated as MWNet, as the main branch. This model adeptly apprehends the spatial intricacies and inter-joint correlations inherent in motion sequences, facilitated by the integration of multi-wise self-attention modules. Extensive experiments show that our method achieves competitive results in single-condition and multi-condition HMS tasks.
Related papers
- Multi-Resolution Generative Modeling of Human Motion from Limited Data [3.5229503563299915]
We present a generative model that learns to synthesize human motion from limited training sequences.
The model adeptly captures human motion patterns by integrating skeletal convolution layers and a multi-scale architecture.
arXiv Detail & Related papers (2024-11-25T15:36:29Z) - GUESS:GradUally Enriching SyntheSis for Text-Driven Human Motion
Generation [23.435588151215594]
We propose a novel cascaded diffusion-based generative framework for text-driven human motion synthesis.
The framework exploits a strategy named GradUally Enriching SyntheSis (GUESS) as its abbreviation.
We show that GUESS outperforms existing state-of-the-art methods by large margins in terms of accuracy, realisticness, and diversity.
arXiv Detail & Related papers (2024-01-04T08:48:21Z) - MCM: Multi-condition Motion Synthesis Framework for Multi-scenario [28.33039094451924]
We introduce MCM, a novel paradigm for motion synthesis that spans multiple scenarios under diverse conditions.
The MCM framework is able to integrate with any DDPM-like diffusion model to accommodate multi-conditional information input.
Our approach achieves SoTA results in both text-to-motion and competitive results in music-to-dance tasks, comparable to task-specific methods.
arXiv Detail & Related papers (2023-09-06T14:17:49Z) - DiverseMotion: Towards Diverse Human Motion Generation via Discrete
Diffusion [70.33381660741861]
We present DiverseMotion, a new approach for synthesizing high-quality human motions conditioned on textual descriptions.
We show that our DiverseMotion achieves the state-of-the-art motion quality and competitive motion diversity.
arXiv Detail & Related papers (2023-09-04T05:43:48Z) - Persistent-Transient Duality: A Multi-mechanism Approach for Modeling
Human-Object Interaction [58.67761673662716]
Humans are highly adaptable, swiftly switching between different modes to handle different tasks, situations and contexts.
In Human-object interaction (HOI) activities, these modes can be attributed to two mechanisms: (1) the large-scale consistent plan for the whole activity and (2) the small-scale children interactive actions that start and end along the timeline.
This work proposes to model two concurrent mechanisms that jointly control human motion.
arXiv Detail & Related papers (2023-07-24T12:21:33Z) - Interactive Character Control with Auto-Regressive Motion Diffusion Models [18.727066177880708]
We propose A-MDM (Auto-regressive Motion Diffusion Model) for real-time motion synthesis.
Our conditional diffusion model takes an initial pose as input, and auto-regressively generates successive motion frames conditioned on previous frame.
We introduce a suite of techniques for incorporating interactive controls into A-MDM, such as task-oriented sampling, in-painting, and hierarchical reinforcement learning.
arXiv Detail & Related papers (2023-06-01T07:48:34Z) - Executing your Commands via Motion Diffusion in Latent Space [51.64652463205012]
We propose a Motion Latent-based Diffusion model (MLD) to produce vivid motion sequences conforming to the given conditional inputs.
Our MLD achieves significant improvements over the state-of-the-art methods among extensive human motion generation tasks.
arXiv Detail & Related papers (2022-12-08T03:07:00Z) - A Novel Unified Conditional Score-based Generative Framework for
Multi-modal Medical Image Completion [54.512440195060584]
We propose the Unified Multi-Modal Conditional Score-based Generative Model (UMM-CSGM) to take advantage of Score-based Generative Model (SGM)
UMM-CSGM employs a novel multi-in multi-out Conditional Score Network (mm-CSN) to learn a comprehensive set of cross-modal conditional distributions.
Experiments on BraTS19 dataset show that the UMM-CSGM can more reliably synthesize the heterogeneous enhancement and irregular area in tumor-induced lesions.
arXiv Detail & Related papers (2022-07-07T16:57:21Z) - Optimal transport for causal discovery [13.38095181298957]
We provide a novel dynamical-system view of Functional Causal Models (FCMs)
We then propose a new framework for identifying causal direction in the bivariate case.
Our method demonstrated state-of-the-art results on both synthetic and causal discovery benchmark datasets.
arXiv Detail & Related papers (2022-01-23T21:09:45Z) - Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal
Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations.
Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z) - Cross-modal Consensus Network for Weakly Supervised Temporal Action
Localization [74.34699679568818]
Weakly supervised temporal action localization (WS-TAL) is a challenging task that aims to localize action instances in the given video with video-level categorical supervision.
We propose a cross-modal consensus network (CO2-Net) to tackle this problem.
arXiv Detail & Related papers (2021-07-27T04:21:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.