Motion-Conditioned Diffusion Model for Controllable Video Synthesis
- URL: http://arxiv.org/abs/2304.14404v1
- Date: Thu, 27 Apr 2023 17:59:32 GMT
- Title: Motion-Conditioned Diffusion Model for Controllable Video Synthesis
- Authors: Tsai-Shien Chen, Chieh Hubert Lin, Hung-Yu Tseng, Tsung-Yi Lin,
Ming-Hsuan Yang
- Abstract summary: We introduce MCDiff, a conditional diffusion model that generates a video from a starting image frame and a set of strokes.
We show that MCDiff achieves the state-the-art visual quality in stroke-guided controllable video synthesis.
- Score: 75.367816656045
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recent advancements in diffusion models have greatly improved the quality and
diversity of synthesized content. To harness the expressive power of diffusion
models, researchers have explored various controllable mechanisms that allow
users to intuitively guide the content synthesis process. Although the latest
efforts have primarily focused on video synthesis, there has been a lack of
effective methods for controlling and describing desired content and motion. In
response to this gap, we introduce MCDiff, a conditional diffusion model that
generates a video from a starting image frame and a set of strokes, which allow
users to specify the intended content and dynamics for synthesis. To tackle the
ambiguity of sparse motion inputs and achieve better synthesis quality, MCDiff
first utilizes a flow completion model to predict the dense video motion based
on the semantic understanding of the video frame and the sparse motion control.
Then, the diffusion model synthesizes high-quality future frames to form the
output video. We qualitatively and quantitatively show that MCDiff achieves the
state-the-of-art visual quality in stroke-guided controllable video synthesis.
Additional experiments on MPII Human Pose further exhibit the capability of our
model on diverse content and motion synthesis.
Related papers
- Towards motion from video diffusion models [10.493424298717864]
We propose to synthesize human motion by deforming an SMPL-X body representation guided by Score distillation sampling (SDS) calculated using a video diffusion model.
By analyzing the fidelity of the resulting animations, we gain insights into the extent to which we can obtain motion using publicly available text-to-video diffusion models.
arXiv Detail & Related papers (2024-11-19T19:35:28Z) - Video Diffusion Models are Training-free Motion Interpreter and Controller [20.361790608772157]
This paper introduces a novel perspective to understand, localize, and manipulate motion-aware features in video diffusion models.
We present a new MOtion FeaTure (MOFT) by eliminating content correlation information and filtering motion channels.
arXiv Detail & Related papers (2024-05-23T17:59:40Z) - Animate Your Motion: Turning Still Images into Dynamic Videos [58.63109848837741]
We introduce Scene and Motion Conditional Diffusion (SMCD), a novel methodology for managing multimodal inputs.
SMCD incorporates a recognized motion conditioning module and investigates various approaches to integrate scene conditions.
Our design significantly enhances video quality, motion precision, and semantic coherence.
arXiv Detail & Related papers (2024-03-15T10:36:24Z) - Diffusion Priors for Dynamic View Synthesis from Monocular Videos [59.42406064983643]
Dynamic novel view synthesis aims to capture the temporal evolution of visual content within videos.
We first finetune a pretrained RGB-D diffusion model on the video frames using a customization technique.
We distill the knowledge from the finetuned model to a 4D representations encompassing both dynamic and static Neural Radiance Fields.
arXiv Detail & Related papers (2024-01-10T23:26:41Z) - VideoLCM: Video Latent Consistency Model [52.3311704118393]
VideoLCM builds upon existing latent video diffusion models and incorporates consistency distillation techniques for training the latent consistency model.
VideoLCM achieves high-fidelity and smooth video synthesis with only four sampling steps, showcasing the potential for real-time synthesis.
arXiv Detail & Related papers (2023-12-14T16:45:36Z) - Controllable Motion Synthesis and Reconstruction with Autoregressive
Diffusion Models [18.50942770933098]
MoDiff is an autoregressive probabilistic diffusion model over motion sequences conditioned on control contexts of other modalities.
Our model integrates a cross-modal Transformer encoder and a Transformer-based decoder, which are found effective in capturing temporal correlations in motion and control modalities.
arXiv Detail & Related papers (2023-04-03T08:17:08Z) - MoFusion: A Framework for Denoising-Diffusion-based Motion Synthesis [73.52948992990191]
MoFusion is a new denoising-diffusion-based framework for high-quality conditional human motion synthesis.
We present ways to introduce well-known kinematic losses for motion plausibility within the motion diffusion framework.
We demonstrate the effectiveness of MoFusion compared to the state of the art on established benchmarks in the literature.
arXiv Detail & Related papers (2022-12-08T18:59:48Z) - VIDM: Video Implicit Diffusion Models [75.90225524502759]
Diffusion models have emerged as a powerful generative method for synthesizing high-quality and diverse set of images.
We propose a video generation method based on diffusion models, where the effects of motion are modeled in an implicit condition.
We improve the quality of the generated videos by proposing multiple strategies such as sampling space truncation, robustness penalty, and positional group normalization.
arXiv Detail & Related papers (2022-12-01T02:58:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.