Related papers: Make-An-Animation: Large-Scale Text-conditional 3D Human Motion Generation

Make-An-Animation: Large-Scale Text-conditional 3D Human Motion Generation

URL: http://arxiv.org/abs/2305.09662v1
Date: Tue, 16 May 2023 17:58:43 GMT
Title: Make-An-Animation: Large-Scale Text-conditional 3D Human Motion Generation
Authors: Samaneh Azadi, Akbar Shah, Thomas Hayes, Devi Parikh, Sonal Gupta
Abstract summary: We introduce Make-An-Animation, a text-conditioned human motion generation model. It learns more diverse poses and prompts from large-scale image-text datasets. It reaches state-of-the-art performance on text-to-motion generation.
Score: 47.272177594990104
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text-guided human motion generation has drawn significant interest because of its impactful applications spanning animation and robotics. Recently, application of diffusion models for motion generation has enabled improvements in the quality of generated motions. However, existing approaches are limited by their reliance on relatively small-scale motion capture data, leading to poor performance on more diverse, in-the-wild prompts. In this paper, we introduce Make-An-Animation, a text-conditioned human motion generation model which learns more diverse poses and prompts from large-scale image-text datasets, enabling significant improvement in performance over prior works. Make-An-Animation is trained in two stages. First, we train on a curated large-scale dataset of (text, static pseudo-pose) pairs extracted from image-text datasets. Second, we fine-tune on motion capture data, adding additional layers to model the temporal dimension. Unlike prior diffusion models for motion generation, Make-An-Animation uses a U-Net architecture similar to recent text-to-video generation models. Human evaluation of motion realism and alignment with input text shows that our model reaches state-of-the-art performance on text-to-motion generation.

Related papers

Strong and Controllable 3D Motion Generation [0.0]
We introduce Motion ControlNet, which enables more precise joint-level control of human motion compared to previous text-to-motion generation methods. These contributions represent a significant advancement for text-to-motion generation, bringing it closer to real-world applications.
arXiv Detail & Related papers (2025-01-30T20:06:30Z)
PackDiT: Joint Human Motion and Text Generation via Mutual Prompting [22.53146582495341]
PackDiT is the first diffusion-based generative model capable of performing various tasks simultaneously. We train PackDiT on the HumanML3D dataset, achieving state-of-the-art text-to-motion performance with an FID score of 0.106. Our experiments further demonstrate that diffusion models are effective for motion-to-text generation, achieving performance comparable to that of autoregressive models.
arXiv Detail & Related papers (2025-01-27T22:51:45Z)
Motion-2-to-3: Leveraging 2D Motion Data to Boost 3D Motion Generation [43.915871360698546]
2D human videos offer a vast and accessible source of motion data, covering a wider range of styles and activities. We introduce a novel framework that disentangles local joint motion from global movements, enabling efficient learning of local motion priors from 2D data. Our method efficiently utilizes 2D data, supporting realistic 3D human motion generation and broadening the range of motion types it supports.
arXiv Detail & Related papers (2024-12-17T17:34:52Z)
Motion Prompting: Controlling Video Generation with Motion Trajectories [57.049252242807874]
We train a video generation model conditioned on sparse or dense video trajectories. We translate high-level user requests into detailed, semi-dense motion prompts. We demonstrate our approach through various applications, including camera and object motion control, "interacting" with an image, motion transfer, and image editing.
arXiv Detail & Related papers (2024-12-03T18:59:56Z)
Quo Vadis, Motion Generation? From Large Language Models to Large Motion Models [70.78051873517285]
We present MotionBase, the first million-level motion generation benchmark. By leveraging this vast dataset, our large motion model demonstrates strong performance across a broad range of motions. We introduce a novel 2D lookup-free approach for motion tokenization, which preserves motion information and expands codebook capacity.
arXiv Detail & Related papers (2024-10-04T10:48:54Z)
T2M-X: Learning Expressive Text-to-Motion Generation from Partially Annotated Data [6.6240820702899565]
Existing methods only generate body motion data, excluding facial expressions and hand movements. Recent attempts to create such a dataset have resulted in either motion inconsistency among different body parts. We propose T2M-X, a two-stage method that learns expressive text-to-motion generation from partially annotated data.
arXiv Detail & Related papers (2024-09-20T06:20:00Z)
MotionFix: Text-Driven 3D Human Motion Editing [52.11745508960547]
Key challenges include the scarcity of training data and the need to design a model that accurately edits the source motion. We propose a methodology to semi-automatically collect a dataset of triplets comprising (i) a source motion, (ii) a target motion, and (iii) an edit text. Access to this data allows us to train a conditional diffusion model, TMED, that takes both the source motion and the edit text as input.
arXiv Detail & Related papers (2024-08-01T16:58:50Z)
Generating Human Interaction Motions in Scenes with Text Control [66.74298145999909]
We present TeSMo, a method for text-controlled scene-aware motion generation based on denoising diffusion models. Our approach begins with pre-training a scene-agnostic text-to-motion diffusion model. To facilitate training, we embed annotated navigation and interaction motions within scenes.
arXiv Detail & Related papers (2024-04-16T16:04:38Z)
Motion Generation from Fine-grained Textual Descriptions [29.033358642532722]
We build a large-scale language-motion dataset specializing in fine-grained textual descriptions, FineHumanML3D. We design a new text2motion model, FineMotionDiffuse, making full use of fine-grained textual information. Our evaluation shows that FineMotionDiffuse trained on FineHumanML3D improves FID by a large margin of 0.38, compared with competitive baselines.
arXiv Detail & Related papers (2024-03-20T11:38:30Z)
OmniMotionGPT: Animal Motion Generation with Limited Data [70.35662376853163]
We introduce AnimalML3D, the first text-animal motion dataset with 1240 animation sequences spanning 36 different animal identities. We are able to generate animal motions with high diversity and fidelity, quantitatively and qualitatively outperforming the results of training human motion generation baselines on animal data.
arXiv Detail & Related papers (2023-11-30T07:14:00Z)
FLAME: Free-form Language-based Motion Synthesis & Editing [17.70085940884357]
We propose a diffusion-based motion synthesis and editing model named FLAME. FLAME can generate high-fidelity motions well aligned with the given text. It can edit the parts of the motion, both frame-wise and joint-wise, without any fine-tuning.
arXiv Detail & Related papers (2022-09-01T10:34:57Z)
MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model [35.32967411186489]
MotionDiffuse is a diffusion model-based text-driven motion generation framework. It excels at modeling complicated data distribution and generating vivid motion sequences. It responds to fine-grained instructions on body parts, and arbitrary-length motion synthesis with time-varied text prompts.
arXiv Detail & Related papers (2022-08-31T17:58:54Z)
TEMOS: Generating diverse human motions from textual descriptions [53.85978336198444]
We address the problem of generating diverse 3D human motions from textual descriptions. We propose TEMOS, a text-conditioned generative model leveraging variational autoencoder (VAE) training with human motion data. We show that TEMOS framework can produce both skeleton-based animations as in prior work, as well more expressive SMPL body motions.
arXiv Detail & Related papers (2022-04-25T14:53:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.