Plan, Posture and Go: Towards Open-World Text-to-Motion Generation
- URL: http://arxiv.org/abs/2312.14828v1
- Date: Fri, 22 Dec 2023 17:02:45 GMT
- Title: Plan, Posture and Go: Towards Open-World Text-to-Motion Generation
- Authors: Jinpeng Liu, Wenxun Dai, Chunyu Wang, Yiji Cheng, Yansong Tang, Xin
Tong
- Abstract summary: We present a divide-and-conquer framework named PRO-Motion.
It consists of three modules as motion planner, posture-diffuser and go-diffuser.
Pro-Motion can generate diverse and realistic motions from complex open-world prompts.
- Score: 43.392549755386135
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Conventional text-to-motion generation methods are usually trained on limited
text-motion pairs, making them hard to generalize to open-world scenarios. Some
works use the CLIP model to align the motion space and the text space, aiming
to enable motion generation from natural language motion descriptions. However,
they are still constrained to generate limited and unrealistic in-place
motions. To address these issues, we present a divide-and-conquer framework
named PRO-Motion, which consists of three modules as motion planner,
posture-diffuser and go-diffuser. The motion planner instructs Large Language
Models (LLMs) to generate a sequence of scripts describing the key postures in
the target motion. Differing from natural languages, the scripts can describe
all possible postures following very simple text templates. This significantly
reduces the complexity of posture-diffuser, which transforms a script to a
posture, paving the way for open-world generation. Finally, go-diffuser,
implemented as another diffusion model, estimates whole-body translations and
rotations for all postures, resulting in realistic motions. Experimental
results have shown the superiority of our method with other counterparts, and
demonstrated its capability of generating diverse and realistic motions from
complex open-world prompts such as "Experiencing a profound sense of joy". The
project page is available at https://moonsliu.github.io/Pro-Motion.
Related papers
- Textual Decomposition Then Sub-motion-space Scattering for Open-Vocabulary Motion Generation [74.94730615777212]
Text-to-motion generation is a crucial task in computer vision, which generates the target 3D motion by the given text.
The current annotated dataset's limited scale only allows them to achieve mapping from sub-text-space to sub-motion-space.
This paper proposes to leverage the atomic motion as an intermediate representation, and leverage two orderly coupled steps, i.e., Textual Decomposition and Sub-motion-space Scattering.
arXiv Detail & Related papers (2024-11-06T17:57:43Z) - MotionGPT-2: A General-Purpose Motion-Language Model for Motion Generation and Understanding [76.30210465222218]
MotionGPT-2 is a unified Large Motion-Language Model (LMLMLM)
It supports multimodal control conditions through pre-trained Large Language Models (LLMs)
It is highly adaptable to the challenging 3D holistic motion generation task.
arXiv Detail & Related papers (2024-10-29T05:25:34Z) - LaMP: Language-Motion Pretraining for Motion Generation, Retrieval, and Captioning [19.801187860991117]
This work introduces LaMP, a novel Language-Motion Pretraining model.
LaMP generates motion-informative text embeddings, significantly enhancing the relevance and semantics of generated motion sequences.
For captioning, we finetune a large language model with the language-informative motion features to develop a strong motion captioning model.
arXiv Detail & Related papers (2024-10-09T17:33:03Z) - Unimotion: Unifying 3D Human Motion Synthesis and Understanding [47.18338511861108]
We introduce Unimotion, the first unified multi-task human motion model capable of both flexible motion control and frame-level motion understanding.
Unimotion allows to control motion with global text, or local frame-level text, or both at once, providing more flexible control for users.
arXiv Detail & Related papers (2024-09-24T09:20:06Z) - Generating Human Interaction Motions in Scenes with Text Control [66.74298145999909]
We present TeSMo, a method for text-controlled scene-aware motion generation based on denoising diffusion models.
Our approach begins with pre-training a scene-agnostic text-to-motion diffusion model.
To facilitate training, we embed annotated navigation and interaction motions within scenes.
arXiv Detail & Related papers (2024-04-16T16:04:38Z) - CoMo: Controllable Motion Generation through Language Guided Pose Code Editing [57.882299081820626]
We introduce CoMo, a Controllable Motion generation model, adept at accurately generating and editing motions.
CoMo decomposes motions into discrete and semantically meaningful pose codes.
It autoregressively generates sequences of pose codes, which are then decoded into 3D motions.
arXiv Detail & Related papers (2024-03-20T18:11:10Z) - Motion Generation from Fine-grained Textual Descriptions [29.033358642532722]
We build a large-scale language-motion dataset specializing in fine-grained textual descriptions, FineHumanML3D.
We design a new text2motion model, FineMotionDiffuse, making full use of fine-grained textual information.
Our evaluation shows that FineMotionDiffuse trained on FineHumanML3D improves FID by a large margin of 0.38, compared with competitive baselines.
arXiv Detail & Related papers (2024-03-20T11:38:30Z) - LivePhoto: Real Image Animation with Text-guided Motion Control [51.31418077586208]
This work presents a practical system, named LivePhoto, which allows users to animate an image of their interest with text descriptions.
We first establish a strong baseline that helps a well-learned text-to-image generator (i.e., Stable Diffusion) take an image as a further input.
We then equip the improved generator with a motion module for temporal modeling and propose a carefully designed training pipeline to better link texts and motions.
arXiv Detail & Related papers (2023-12-05T17:59:52Z) - Story-to-Motion: Synthesizing Infinite and Controllable Character
Animation from Long Text [14.473103773197838]
A new task, Story-to-Motion, arises when characters are required to perform specific motions based on a long text description.
Previous works in character control and text-to-motion have addressed related aspects, yet a comprehensive solution remains elusive.
We propose a novel system that generates controllable, infinitely long motions and trajectories aligned with the input text.
arXiv Detail & Related papers (2023-11-13T16:22:38Z) - MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model [35.32967411186489]
MotionDiffuse is a diffusion model-based text-driven motion generation framework.
It excels at modeling complicated data distribution and generating vivid motion sequences.
It responds to fine-grained instructions on body parts, and arbitrary-length motion synthesis with time-varied text prompts.
arXiv Detail & Related papers (2022-08-31T17:58:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.