MoVideo: Motion-Aware Video Generation with Diffusion Models
- URL: http://arxiv.org/abs/2311.11325v1
- Date: Sun, 19 Nov 2023 13:36:03 GMT
- Title: MoVideo: Motion-Aware Video Generation with Diffusion Models
- Authors: Jingyun Liang, Yuchen Fan, Kai Zhang, Radu Timofte, Luc Van Gool,
Rakesh Ranjan
- Abstract summary: We propose a novel motion-aware generation (MoVideo) framework that takes motion into consideration from two aspects: video depth and optical flow.
MoVideo achieves state-of-the-art results in both text-to-video and image-to-video generation, showing promising prompt consistency, frame consistency and visual quality.
- Score: 102.81825637792572
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: While recent years have witnessed great progress on using diffusion models
for video generation, most of them are simple extensions of image generation
frameworks, which fail to explicitly consider one of the key differences
between videos and images, i.e., motion. In this paper, we propose a novel
motion-aware video generation (MoVideo) framework that takes motion into
consideration from two aspects: video depth and optical flow. The former
regulates motion by per-frame object distances and spatial layouts, while the
later describes motion by cross-frame correspondences that help in preserving
fine details and improving temporal consistency. More specifically, given a key
frame that exists or generated from text prompts, we first design a diffusion
model with spatio-temporal modules to generate the video depth and the
corresponding optical flows. Then, the video is generated in the latent space
by another spatio-temporal diffusion model under the guidance of depth, optical
flow-based warped latent video and the calculated occlusion mask. Lastly, we
use optical flows again to align and refine different frames for better video
decoding from the latent space to the pixel space. In experiments, MoVideo
achieves state-of-the-art results in both text-to-video and image-to-video
generation, showing promising prompt consistency, frame consistency and visual
quality.
Related papers
- StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation [117.13475564834458]
We propose a new way of self-attention calculation, termed Consistent Self-Attention.
To extend our method to long-range video generation, we introduce a novel semantic space temporal motion prediction module.
By merging these two novel components, our framework, referred to as StoryDiffusion, can describe a text-based story with consistent images or videos.
arXiv Detail & Related papers (2024-05-02T16:25:16Z) - LoopAnimate: Loopable Salient Object Animation [19.761865029125524]
LoopAnimate is a novel method for generating videos with consistent start and end frames.
It achieves state-of-the-art performance in both objective metrics, such as fidelity and temporal consistency, and subjective evaluation results.
arXiv Detail & Related papers (2024-04-14T07:36:18Z) - Customize-A-Video: One-Shot Motion Customization of Text-to-Video
Diffusion Models [50.65904921917907]
We propose Customize-A-Video that models the motion from a single reference video and adapting it to new subjects and scenes with both spatial and temporal varieties.
Our proposed method can be easily extended to various downstream tasks, including custom video generation and editing, video appearance customization, and multiple motion combination.
arXiv Detail & Related papers (2024-02-22T18:38:48Z) - Lumiere: A Space-Time Diffusion Model for Video Generation [75.54967294846686]
We introduce a Space-Time U-Net architecture that generates the entire temporal duration of the video at once.
This is in contrast to existing video models which synthesize distants followed by temporal super-resolution.
By deploying both spatial and (importantly) temporal down- and up-sampling, our model learns to directly generate a full-frame-rate, low-resolution video.
arXiv Detail & Related papers (2024-01-23T18:05:25Z) - VMC: Video Motion Customization using Temporal Attention Adaption for
Text-to-Video Diffusion Models [58.93124686141781]
Video Motion Customization (VMC) is a novel one-shot tuning approach crafted to adapt temporal attention layers within video diffusion models.
Our approach introduces a novel motion distillation objective using residual vectors between consecutive frames as a motion reference.
We validate our method against state-of-the-art video generative models across diverse real-world motions and contexts.
arXiv Detail & Related papers (2023-12-01T06:50:11Z) - LaMD: Latent Motion Diffusion for Video Generation [69.4111397077229]
latent motion diffusion (LaMD) framework consists of a motion-decomposed video autoencoder and a diffusion-based motion generator.
Results show that LaMD generates high-quality videos with a wide range of motions, from dynamics to highly controllable movements.
arXiv Detail & Related papers (2023-04-23T10:32:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.