Related papers: M2D2M: Multi-Motion Generation from Text with Discrete Diffusion Models

M2D2M: Multi-Motion Generation from Text with Discrete Diffusion Models

URL: http://arxiv.org/abs/2407.14502v1
Date: Fri, 19 Jul 2024 17:57:33 GMT
Title: M2D2M: Multi-Motion Generation from Text with Discrete Diffusion Models
Authors: Seunggeun Chi, Hyung-gun Chi, Hengbo Ma, Nakul Agarwal, Faizan Siddiqui, Karthik Ramani, Kwonjoon Lee,
Abstract summary: We introduce the Multi-Motion Discrete Diffusion Models (M2D2M), a novel approach for human motion generation from text descriptions. M2D2M adeptly addresses the challenge of generating multi-motion sequences, ensuring seamless transitions of motions and coherence across a series of actions.
Score: 18.125860678409804
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We introduce the Multi-Motion Discrete Diffusion Models (M2D2M), a novel approach for human motion generation from textual descriptions of multiple actions, utilizing the strengths of discrete diffusion models. This approach adeptly addresses the challenge of generating multi-motion sequences, ensuring seamless transitions of motions and coherence across a series of actions. The strength of M2D2M lies in its dynamic transition probability within the discrete diffusion model, which adapts transition probabilities based on the proximity between motion tokens, encouraging mixing between different modes. Complemented by a two-phase sampling strategy that includes independent and joint denoising steps, M2D2M effectively generates long-term, smooth, and contextually coherent human motion sequences, utilizing a model trained for single-motion generation. Extensive experiments demonstrate that M2D2M surpasses current state-of-the-art benchmarks for motion generation from text descriptions, showcasing its efficacy in interpreting language semantics and generating dynamic, realistic motions.

Related papers

MotionGPT3: Human Motion as a Second Modality [20.804747077748953]
We propose MotionGPT3, a bimodal motion-language model that treats human motion as a second modality.<n>To preserve language intelligence, the text branch retains the original structure and parameters of the pretrained language model.<n>Our approach achieves competitive performance on both motion understanding and generation tasks.
arXiv Detail & Related papers (2025-06-30T17:42:22Z)
GENMO: A GENeralist Model for Human MOtion [64.16188966024542]
We present GENMO, a unified Generalist Model for Human Motion that bridges motion estimation and generation in a single framework.<n>Our key insight is to reformulate motion estimation as constrained motion generation, where the output motion must precisely satisfy observed conditioning signals.<n>Our novel architecture handles variable-length motions and mixed multimodal conditions (text, audio, video) at different time intervals, offering flexible control.
arXiv Detail & Related papers (2025-05-02T17:59:55Z)
EnergyMoGen: Compositional Human Motion Generation with Energy-Based Diffusion Model in Latent Space [31.88362663621884]
We propose EnergyMoGen, which includes two spectrums of Energy-Based Models. We introduce a semantic-aware energy model based on cross-attention. This design allows the motion latent diffusion model to synthesize high-quality, complex motions.
arXiv Detail & Related papers (2024-12-19T10:19:43Z)
Multimodal Latent Language Modeling with Next-Token Diffusion [111.93906046452125]
Multimodal generative models require a unified approach to handle both discrete data (e.g., text and code) and continuous data (e.g., image, audio, video) We propose Latent Language Modeling (LatentLM), which seamlessly integrates continuous and discrete data using causal Transformers.
arXiv Detail & Related papers (2024-12-11T18:57:32Z)
MotionGPT-2: A General-Purpose Motion-Language Model for Motion Generation and Understanding [76.30210465222218]
MotionGPT-2 is a unified Large Motion-Language Model (LMLMLM) It supports multimodal control conditions through pre-trained Large Language Models (LLMs) It is highly adaptable to the challenging 3D holistic motion generation task.
arXiv Detail & Related papers (2024-10-29T05:25:34Z)
MDMP: Multi-modal Diffusion for supervised Motion Predictions with uncertainty [7.402769693163035]
This paper introduces a Multi-modal Diffusion model for Motion Prediction (MDMP) It integrates skeletal data and textual descriptions of actions to generate refined long-term motion predictions with quantifiable uncertainty. Our model consistently outperforms existing generative techniques in accurately predicting long-term motions.
arXiv Detail & Related papers (2024-10-04T18:49:00Z)
Text-driven Human Motion Generation with Motion Masked Diffusion Model [23.637853270123045]
Text human motion generation is a task that synthesizes human motion sequences conditioned on natural language. Current diffusion model-based approaches have outstanding performance in the diversity and multimodality of generation. We propose Motion Masked Diffusion Model bftext(MMDM), a novel human motion mechanism for diffusion model.
arXiv Detail & Related papers (2024-09-29T12:26:24Z)
MotionCraft: Crafting Whole-Body Motion with Plug-and-Play Multimodal Controls [30.487510829107908]
We propose MotionCraft, a unified diffusion transformer that crafts whole-body motion with plug-and-play multimodal control. Our framework employs a coarse-to-fine training strategy, starting with the first stage of text-to-motion semantic pre-training. We introduce MC-Bench, the first available multimodal whole-body motion generation benchmark based on the unified SMPL-X format.
arXiv Detail & Related papers (2024-07-30T18:57:06Z)
DiverseMotion: Towards Diverse Human Motion Generation via Discrete Diffusion [70.33381660741861]
We present DiverseMotion, a new approach for synthesizing high-quality human motions conditioned on textual descriptions. We show that our DiverseMotion achieves the state-of-the-art motion quality and competitive motion diversity.
arXiv Detail & Related papers (2023-09-04T05:43:48Z)
Priority-Centric Human Motion Generation in Discrete Latent Space [59.401128190423535]
We introduce a Priority-Centric Motion Discrete Diffusion Model (M2DM) for text-to-motion generation. M2DM incorporates a global self-attention mechanism and a regularization term to counteract code collapse. We also present a motion discrete diffusion model that employs an innovative noise schedule, determined by the significance of each motion token.
arXiv Detail & Related papers (2023-08-28T10:40:16Z)
Synthesizing Long-Term Human Motions with Diffusion Models via Coherent Sampling [74.62570964142063]
Text-to-motion generation has gained increasing attention, but most existing methods are limited to generating short-term motions. We propose a novel approach that utilizes a past-conditioned diffusion model with two optional coherent sampling methods. Our proposed method is capable of generating compositional and coherent long-term 3D human motions controlled by a user-instructed long text stream.
arXiv Detail & Related papers (2023-08-03T16:18:32Z)
Modiff: Action-Conditioned 3D Motion Generation with Denoising Diffusion Probabilistic Models [58.357180353368896]
We propose a conditional paradigm that benefits from the denoising diffusion probabilistic model (DDPM) to tackle the problem of realistic and diverse action-conditioned 3D skeleton-based motion generation. We are a pioneering attempt that uses DDPM to synthesize a variable number of motion sequences conditioned on a categorical action.
arXiv Detail & Related papers (2023-01-10T13:15:42Z)
Executing your Commands via Motion Diffusion in Latent Space [51.64652463205012]
We propose a Motion Latent-based Diffusion model (MLD) to produce vivid motion sequences conforming to the given conditional inputs. Our MLD achieves significant improvements over the state-of-the-art methods among extensive human motion generation tasks.
arXiv Detail & Related papers (2022-12-08T03:07:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.