Related papers: Towards motion from video diffusion models

Towards motion from video diffusion models

URL: http://arxiv.org/abs/2411.12831v1
Date: Tue, 19 Nov 2024 19:35:28 GMT
Title: Towards motion from video diffusion models
Authors: Paul Janson, Tiberiu Popa, Eugene Belilovsky,
Abstract summary: We propose to synthesize human motion by deforming an SMPL-X body representation guided by Score distillation sampling (SDS) calculated using a video diffusion model. By analyzing the fidelity of the resulting animations, we gain insights into the extent to which we can obtain motion using publicly available text-to-video diffusion models.
Score: 10.493424298717864
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Text-conditioned video diffusion models have emerged as a powerful tool in the realm of video generation and editing. But their ability to capture the nuances of human movement remains under-explored. Indeed the ability of these models to faithfully model an array of text prompts can lead to a wide host of applications in human and character animation. In this work, we take initial steps to investigate whether these models can effectively guide the synthesis of realistic human body animations. Specifically we propose to synthesize human motion by deforming an SMPL-X body representation guided by Score distillation sampling (SDS) calculated using a video diffusion model. By analyzing the fidelity of the resulting animations, we gain insights into the extent to which we can obtain motion using publicly available text-to-video diffusion models using SDS. Our findings shed light on the potential and limitations of these models for generating diverse and plausible human motions, paving the way for further research in this exciting area.

Related papers

Animating the Uncaptured: Humanoid Mesh Animation with Video Diffusion Models [71.78723353724493]
Animation of humanoid characters is essential in various graphics applications. We propose an approach to synthesize 4D animated sequences of input static 3D humanoid meshes.
arXiv Detail & Related papers (2025-03-20T10:00:22Z)
DirectorLLM for Human-Centric Video Generation [46.37441947526771]
We introduce DirectorLLM, a novel video generation model that employs a large language model (LLM) to orchestrate human poses within videos. Our model outperforms existing ones in generating videos with higher human motion fidelity, improved prompt faithfulness, and enhanced rendered subject naturalness.
arXiv Detail & Related papers (2024-12-19T03:10:26Z)
MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models [59.10171699717122]
MoTrans is a customized motion transfer method enabling video generation of similar motion in new context. multimodal representations from recaptioned prompt and video frames promote the modeling of appearance. Our method effectively learns specific motion pattern from singular or multiple reference videos.
arXiv Detail & Related papers (2024-12-02T10:07:59Z)
Shape Conditioned Human Motion Generation with Diffusion Model [0.0]
We propose a Shape-conditioned Motion Diffusion model (SMD), which enables the generation of motion sequences directly in mesh format. We also propose a Spectral-Temporal Autoencoder (STAE) to leverage cross-temporal dependencies within the spectral domain.
arXiv Detail & Related papers (2024-05-10T19:06:41Z)
Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance [25.346255905155424]
We introduce a methodology for human image animation by leveraging a 3D human parametric model within a latent diffusion framework. By representing the 3D human parametric model as the motion guidance, we can perform parametric shape alignment of the human body between the reference image and the source video motion. Our approach also exhibits superior generalization capabilities on the proposed in-the-wild dataset.
arXiv Detail & Related papers (2024-03-21T18:52:58Z)
Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models [40.71940056121056]
We present a novel approach that combines the controllability of dynamic 3D meshes with the expressivity and editability of emerging diffusion models. We demonstrate our approach on various examples where motion can be obtained by animating rigged assets or changing the camera path.
arXiv Detail & Related papers (2023-12-03T14:17:11Z)
Real-time Animation Generation and Control on Rigged Models via Large Language Models [50.034712575541434]
We introduce a novel method for real-time animation control and generation on rigged models using natural language input. We embed a large language model (LLM) in Unity to output structured texts that can be parsed into diverse and realistic animations.
arXiv Detail & Related papers (2023-10-27T01:36:35Z)
LLM-grounded Video Diffusion Models [57.23066793349706]
Video diffusion models have emerged as a promising tool for neuraltemporal generation. Current models struggle with prompts and often restricted or incorrect motion. We introduce LLM-grounded Video Diffusion (LVD) Our results demonstrate that LVD significantly outperforms its base video diffusion model.
arXiv Detail & Related papers (2023-09-29T17:54:46Z)
Motion-Conditioned Diffusion Model for Controllable Video Synthesis [75.367816656045]
We introduce MCDiff, a conditional diffusion model that generates a video from a starting image frame and a set of strokes. We show that MCDiff achieves the state-the-art visual quality in stroke-guided controllable video synthesis.
arXiv Detail & Related papers (2023-04-27T17:59:32Z)
Generative Novel View Synthesis with 3D-Aware Diffusion Models [96.78397108732233]
We present a diffusion-based model for 3D-aware generative novel view synthesis from as few as a single input image. Our method makes use of existing 2D diffusion backbones but, crucially, incorporates geometry priors in the form of a 3D feature volume. In addition to generating novel views, our method has the ability to autoregressively synthesize 3D-consistent sequences.
arXiv Detail & Related papers (2023-04-05T17:15:47Z)
Executing your Commands via Motion Diffusion in Latent Space [51.64652463205012]
We propose a Motion Latent-based Diffusion model (MLD) to produce vivid motion sequences conforming to the given conditional inputs. Our MLD achieves significant improvements over the state-of-the-art methods among extensive human motion generation tasks.
arXiv Detail & Related papers (2022-12-08T03:07:00Z)
FLAME: Free-form Language-based Motion Synthesis & Editing [17.70085940884357]
We propose a diffusion-based motion synthesis and editing model named FLAME. FLAME can generate high-fidelity motions well aligned with the given text. It can edit the parts of the motion, both frame-wise and joint-wise, without any fine-tuning.
arXiv Detail & Related papers (2022-09-01T10:34:57Z)
Human Performance Capture from Monocular Video in the Wild [50.34917313325813]
We propose a method capable of capturing the dynamic 3D human shape from a monocular video featuring challenging body poses. Our method outperforms state-of-the-art methods on an in-the-wild human video dataset 3DPW.
arXiv Detail & Related papers (2021-11-29T16:32:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.