Plan, Posture and Go: Towards Open-World Text-to-Motion Generation
- URL: http://arxiv.org/abs/2312.14828v1
- Date: Fri, 22 Dec 2023 17:02:45 GMT
- Title: Plan, Posture and Go: Towards Open-World Text-to-Motion Generation
- Authors: Jinpeng Liu, Wenxun Dai, Chunyu Wang, Yiji Cheng, Yansong Tang, Xin
Tong
- Abstract summary: We present a divide-and-conquer framework named PRO-Motion.
It consists of three modules as motion planner, posture-diffuser and go-diffuser.
Pro-Motion can generate diverse and realistic motions from complex open-world prompts.
- Score: 43.392549755386135
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Conventional text-to-motion generation methods are usually trained on limited
text-motion pairs, making them hard to generalize to open-world scenarios. Some
works use the CLIP model to align the motion space and the text space, aiming
to enable motion generation from natural language motion descriptions. However,
they are still constrained to generate limited and unrealistic in-place
motions. To address these issues, we present a divide-and-conquer framework
named PRO-Motion, which consists of three modules as motion planner,
posture-diffuser and go-diffuser. The motion planner instructs Large Language
Models (LLMs) to generate a sequence of scripts describing the key postures in
the target motion. Differing from natural languages, the scripts can describe
all possible postures following very simple text templates. This significantly
reduces the complexity of posture-diffuser, which transforms a script to a
posture, paving the way for open-world generation. Finally, go-diffuser,
implemented as another diffusion model, estimates whole-body translations and
rotations for all postures, resulting in realistic motions. Experimental
results have shown the superiority of our method with other counterparts, and
demonstrated its capability of generating diverse and realistic motions from
complex open-world prompts such as "Experiencing a profound sense of joy". The
project page is available at https://moonsliu.github.io/Pro-Motion.
Related papers
- MotionLLM: Multimodal Motion-Language Learning with Large Language Models [69.5875073447454]
We propose MotionLLM to achieve single-human, multi-human motion generation and motion captioning.
Specifically, we encode and quantize motions into discrete LLM-understandable tokens, which results in a unified vocabulary consisting of both motion and text tokens.
Our approach is scalable and flexible, allowing easy extension to multi-human motion generation through autoregressive generation of single-human motions.
arXiv Detail & Related papers (2024-05-27T09:57:51Z) - Generating Human Interaction Motions in Scenes with Text Control [66.74298145999909]
We present TeSMo, a method for text-controlled scene-aware motion generation based on denoising diffusion models.
Our approach begins with pre-training a scene-agnostic text-to-motion diffusion model.
To facilitate training, we embed annotated navigation and interaction motions within scenes.
arXiv Detail & Related papers (2024-04-16T16:04:38Z) - CoMo: Controllable Motion Generation through Language Guided Pose Code Editing [57.882299081820626]
We introduce CoMo, a Controllable Motion generation model, adept at accurately generating and editing motions.
CoMo decomposes motions into discrete and semantically meaningful pose codes.
It autoregressively generates sequences of pose codes, which are then decoded into 3D motions.
arXiv Detail & Related papers (2024-03-20T18:11:10Z) - Motion Generation from Fine-grained Textual Descriptions [29.033358642532722]
We build a large-scale language-motion dataset specializing in fine-grained textual descriptions, FineHumanML3D.
We design a new text2motion model, FineMotionDiffuse, making full use of fine-grained textual information.
Our evaluation shows that FineMotionDiffuse trained on FineHumanML3D improves FID by a large margin of 0.38, compared with competitive baselines.
arXiv Detail & Related papers (2024-03-20T11:38:30Z) - MotionScript: Natural Language Descriptions for Expressive 3D Human
Motions [8.154044578137217]
MotionScript is a motion-to-text conversion algorithm and natural language representation for human body motions.
Our experiments show that when MotionScript representations are used in a text-to-motion neural task, body movements are more accurately reconstructed.
arXiv Detail & Related papers (2023-12-19T22:33:17Z) - OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers [45.808597624491156]
We present OMG, a novel framework, which enables compelling motion generation from zero-shot open-vocabulary text prompts.
At the pre-training stage, our model improves the generation ability by learning the rich out-of-domain inherent motion traits.
At the fine-tuning stage, we introduce motion ControlNet, which incorporates text prompts as conditioning information.
arXiv Detail & Related papers (2023-12-14T14:31:40Z) - LivePhoto: Real Image Animation with Text-guided Motion Control [51.31418077586208]
This work presents a practical system, named LivePhoto, which allows users to animate an image of their interest with text descriptions.
We first establish a strong baseline that helps a well-learned text-to-image generator (i.e., Stable Diffusion) take an image as a further input.
We then equip the improved generator with a motion module for temporal modeling and propose a carefully designed training pipeline to better link texts and motions.
arXiv Detail & Related papers (2023-12-05T17:59:52Z) - Story-to-Motion: Synthesizing Infinite and Controllable Character
Animation from Long Text [14.473103773197838]
A new task, Story-to-Motion, arises when characters are required to perform specific motions based on a long text description.
Previous works in character control and text-to-motion have addressed related aspects, yet a comprehensive solution remains elusive.
We propose a novel system that generates controllable, infinitely long motions and trajectories aligned with the input text.
arXiv Detail & Related papers (2023-11-13T16:22:38Z) - MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model [35.32967411186489]
MotionDiffuse is a diffusion model-based text-driven motion generation framework.
It excels at modeling complicated data distribution and generating vivid motion sequences.
It responds to fine-grained instructions on body parts, and arbitrary-length motion synthesis with time-varied text prompts.
arXiv Detail & Related papers (2022-08-31T17:58:54Z) - TEMOS: Generating diverse human motions from textual descriptions [53.85978336198444]
We address the problem of generating diverse 3D human motions from textual descriptions.
We propose TEMOS, a text-conditioned generative model leveraging variational autoencoder (VAE) training with human motion data.
We show that TEMOS framework can produce both skeleton-based animations as in prior work, as well more expressive SMPL body motions.
arXiv Detail & Related papers (2022-04-25T14:53:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.