FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing
- URL: http://arxiv.org/abs/2312.15004v1
- Date: Fri, 22 Dec 2023 16:56:02 GMT
- Title: FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing
- Authors: Mingyuan Zhang, Huirong Li, Zhongang Cai, Jiawei Ren, Lei Yang, Ziwei
Liu
- Abstract summary: FineMoGen is a diffusion-based motion generation and editing framework.
It can synthesize fine-grained motions, with spatial-temporal composition to the user instructions.
FineMoGen further enables zero-shot motion editing capabilities with the aid of modern large language models.
- Score: 56.29102849106382
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-driven motion generation has achieved substantial progress with the
emergence of diffusion models. However, existing methods still struggle to
generate complex motion sequences that correspond to fine-grained descriptions,
depicting detailed and accurate spatio-temporal actions. This lack of fine
controllability limits the usage of motion generation to a larger audience. To
tackle these challenges, we present FineMoGen, a diffusion-based motion
generation and editing framework that can synthesize fine-grained motions, with
spatial-temporal composition to the user instructions. Specifically, FineMoGen
builds upon diffusion model with a novel transformer architecture dubbed
Spatio-Temporal Mixture Attention (SAMI). SAMI optimizes the generation of the
global attention template from two perspectives: 1) explicitly modeling the
constraints of spatio-temporal composition; and 2) utilizing sparsely-activated
mixture-of-experts to adaptively extract fine-grained features. To facilitate a
large-scale study on this new fine-grained motion generation task, we
contribute the HuMMan-MoGen dataset, which consists of 2,968 videos and 102,336
fine-grained spatio-temporal descriptions. Extensive experiments validate that
FineMoGen exhibits superior motion generation quality over state-of-the-art
methods. Notably, FineMoGen further enables zero-shot motion editing
capabilities with the aid of modern large language models (LLM), which
faithfully manipulates motion sequences with fine-grained instructions. Project
Page: https://mingyuan-zhang.github.io/projects/FineMoGen.html
Related papers
- Infinite Motion: Extended Motion Generation via Long Text Instructions [51.61117351997808]
"Infinite Motion" is a novel approach that leverages long text to extended motion generation.
Key innovation of our model is its ability to accept arbitrary lengths of text as input.
We incorporate the timestamp design for text which allows precise editing of local segments within the generated sequences.
arXiv Detail & Related papers (2024-07-11T12:33:56Z) - MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion [94.66090422753126]
MotionFollower is a lightweight score-guided diffusion model for video motion editing.
It delivers superior motion editing performance and exclusively supports large camera movements and actions.
Compared with MotionEditor, the most advanced motion editing model, MotionFollower achieves an approximately 80% reduction in GPU memory.
arXiv Detail & Related papers (2024-05-30T17:57:30Z) - CoMo: Controllable Motion Generation through Language Guided Pose Code Editing [57.882299081820626]
We introduce CoMo, a Controllable Motion generation model, adept at accurately generating and editing motions.
CoMo decomposes motions into discrete and semantically meaningful pose codes.
It autoregressively generates sequences of pose codes, which are then decoded into 3D motions.
arXiv Detail & Related papers (2024-03-20T18:11:10Z) - Motion Mamba: Efficient and Long Sequence Motion Generation [26.777455596989526]
Recent advancements in state space models (SSMs) have showcased considerable promise in long sequence modeling.
We propose Motion Mamba, a simple and efficient approach that presents the pioneering motion generation model utilized SSMs.
Our proposed method achieves up to 50% FID improvement and up to 4 times faster on the HumanML3D and KIT-ML datasets.
arXiv Detail & Related papers (2024-03-12T10:25:29Z) - MotionMix: Weakly-Supervised Diffusion for Controllable Motion
Generation [19.999239668765885]
MotionMix is a weakly-supervised diffusion model that leverages both noisy and unannotated motion sequences.
Our framework consistently achieves state-of-the-art performances on text-to-motion, action-to-motion, and music-to-dance tasks.
arXiv Detail & Related papers (2024-01-20T04:58:06Z) - Motion Flow Matching for Human Motion Synthesis and Editing [75.13665467944314]
We propose emphMotion Flow Matching, a novel generative model for human motion generation featuring efficient sampling and effectiveness in motion editing applications.
Our method reduces the sampling complexity from thousand steps in previous diffusion models to just ten steps, while achieving comparable performance in text-to-motion and action-to-motion generation benchmarks.
arXiv Detail & Related papers (2023-12-14T12:57:35Z) - MotionCrafter: One-Shot Motion Customization of Diffusion Models [66.44642854791807]
We introduce MotionCrafter, a one-shot instance-guided motion customization method.
MotionCrafter employs a parallel spatial-temporal architecture that injects the reference motion into the temporal component of the base model.
During training, a frozen base model provides appearance normalization, effectively separating appearance from motion.
arXiv Detail & Related papers (2023-12-08T16:31:04Z) - Example-based Motion Synthesis via Generative Motion Matching [44.20519633463265]
We present GenMM, a generative model that "mines" as many diverse motions as possible from a single or few example sequences.
GenMM inherits the training-free nature and the superior quality of the well-known Motion Matching method.
arXiv Detail & Related papers (2023-06-01T06:19:33Z) - MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model [35.32967411186489]
MotionDiffuse is a diffusion model-based text-driven motion generation framework.
It excels at modeling complicated data distribution and generating vivid motion sequences.
It responds to fine-grained instructions on body parts, and arbitrary-length motion synthesis with time-varied text prompts.
arXiv Detail & Related papers (2022-08-31T17:58:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.