Seamless Human Motion Composition with Blended Positional Encodings
- URL: http://arxiv.org/abs/2402.15509v1
- Date: Fri, 23 Feb 2024 18:59:40 GMT
- Title: Seamless Human Motion Composition with Blended Positional Encodings
- Authors: German Barquero, Sergio Escalera and Cristina Palmero
- Abstract summary: We introduce FlowMDM, the first diffusion-based model that generates seamless Human Motion Compositions (HMC) without postprocessing or redundant denoising steps.
We achieve state-of-the-art results in terms of accuracy, realism, and smoothness on the Babel and HumanML3D datasets.
- Score: 38.85158088021282
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Conditional human motion generation is an important topic with many
applications in virtual reality, gaming, and robotics. While prior works have
focused on generating motion guided by text, music, or scenes, these typically
result in isolated motions confined to short durations. Instead, we address the
generation of long, continuous sequences guided by a series of varying textual
descriptions. In this context, we introduce FlowMDM, the first diffusion-based
model that generates seamless Human Motion Compositions (HMC) without any
postprocessing or redundant denoising steps. For this, we introduce the Blended
Positional Encodings, a technique that leverages both absolute and relative
positional encodings in the denoising chain. More specifically, global motion
coherence is recovered at the absolute stage, whereas smooth and realistic
transitions are built at the relative stage. As a result, we achieve
state-of-the-art results in terms of accuracy, realism, and smoothness on the
Babel and HumanML3D datasets. FlowMDM excels when trained with only a single
description per motion sequence thanks to its Pose-Centric Cross-ATtention,
which makes it robust against varying text descriptions at inference time.
Finally, to address the limitations of existing HMC metrics, we propose two new
metrics: the Peak Jerk and the Area Under the Jerk, to detect abrupt
transitions.
Related papers
- FTMoMamba: Motion Generation with Frequency and Text State Space Models [53.60865359814126]
We propose a novel diffusion-based FTMoMamba framework equipped with a Frequency State Space Model and a Text State Space Model.
To learn fine-grained representation, FreqSSM decomposes sequences into low-frequency and high-frequency components.
To ensure the consistency between text and motion, TextSSM encodes text features at the sentence level.
arXiv Detail & Related papers (2024-11-26T15:48:12Z) - DART: A Diffusion-Based Autoregressive Motion Model for Real-Time Text-Driven Motion Control [12.465927271402442]
Text-conditioned human motion generation allows for user interaction through natural language.
DART is a Diffusion-based Autoregressive motion primitive model for Real-time Text-driven motion control.
We present effective algorithms for both approaches, demonstrating our model's versatility and superior performance in various motion synthesis tasks.
arXiv Detail & Related papers (2024-10-07T17:58:22Z) - Text-guided 3D Human Motion Generation with Keyframe-based Parallel Skip Transformer [62.29951737214263]
Existing algorithms directly generate the full sequence which is expensive and prone to errors.
We propose KeyMotion, that generates plausible human motion sequences corresponding to input text.
We use a Variationalcoder (VAE) with Kullback-Leibler regularization to project the Autoencoder into a latent space.
For the reverse diffusion, we propose a novel Parallel Skip Transformer that performs cross-modal attention between the design latents and text condition.
arXiv Detail & Related papers (2024-05-24T11:12:37Z) - MotionMix: Weakly-Supervised Diffusion for Controllable Motion
Generation [19.999239668765885]
MotionMix is a weakly-supervised diffusion model that leverages both noisy and unannotated motion sequences.
Our framework consistently achieves state-of-the-art performances on text-to-motion, action-to-motion, and music-to-dance tasks.
arXiv Detail & Related papers (2024-01-20T04:58:06Z) - DiffusionPhase: Motion Diffusion in Frequency Domain [69.811762407278]
We introduce a learning-based method for generating high-quality human motion sequences from text descriptions.
Existing techniques struggle with motion diversity and smooth transitions in generating arbitrary-length motion sequences.
We develop a network encoder that converts the motion space into a compact yet expressive parameterized phase space.
arXiv Detail & Related papers (2023-12-07T04:39:22Z) - DiverseMotion: Towards Diverse Human Motion Generation via Discrete
Diffusion [70.33381660741861]
We present DiverseMotion, a new approach for synthesizing high-quality human motions conditioned on textual descriptions.
We show that our DiverseMotion achieves the state-of-the-art motion quality and competitive motion diversity.
arXiv Detail & Related papers (2023-09-04T05:43:48Z) - Priority-Centric Human Motion Generation in Discrete Latent Space [59.401128190423535]
We introduce a Priority-Centric Motion Discrete Diffusion Model (M2DM) for text-to-motion generation.
M2DM incorporates a global self-attention mechanism and a regularization term to counteract code collapse.
We also present a motion discrete diffusion model that employs an innovative noise schedule, determined by the significance of each motion token.
arXiv Detail & Related papers (2023-08-28T10:40:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.