VIDM: Video Implicit Diffusion Models
- URL: http://arxiv.org/abs/2212.00235v1
- Date: Thu, 1 Dec 2022 02:58:46 GMT
- Title: VIDM: Video Implicit Diffusion Models
- Authors: Kangfu Mei and Vishal M. Patel
- Abstract summary: Diffusion models have emerged as a powerful generative method for synthesizing high-quality and diverse set of images.
We propose a video generation method based on diffusion models, where the effects of motion are modeled in an implicit condition.
We improve the quality of the generated videos by proposing multiple strategies such as sampling space truncation, robustness penalty, and positional group normalization.
- Score: 75.90225524502759
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion models have emerged as a powerful generative method for
synthesizing high-quality and diverse set of images. In this paper, we propose
a video generation method based on diffusion models, where the effects of
motion are modeled in an implicit condition manner, i.e. one can sample
plausible video motions according to the latent feature of frames. We improve
the quality of the generated videos by proposing multiple strategies such as
sampling space truncation, robustness penalty, and positional group
normalization. Various experiments are conducted on datasets consisting of
videos with different resolutions and different number of frames. Results show
that the proposed method outperforms the state-of-the-art generative
adversarial network-based methods by a significant margin in terms of FVD
scores as well as perceptible visual quality.
Related papers
- Optical-Flow Guided Prompt Optimization for Coherent Video Generation [51.430833518070145]
We propose a framework called MotionPrompt that guides the video generation process via optical flow.
We optimize learnable token embeddings during reverse sampling steps by using gradients from a trained discriminator applied to random frame pairs.
This approach allows our method to generate visually coherent video sequences that closely reflect natural motion dynamics, without compromising the fidelity of the generated content.
arXiv Detail & Related papers (2024-11-23T12:26:52Z) - Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep Approach [29.753974393652356]
We propose a frame-aware video diffusion model(FVDM)
Our approach allows each frame to follow an independent noise schedule, enhancing the model's capacity to capture fine-grained temporal dependencies.
Our empirical evaluations show that FVDM outperforms state-of-the-art methods in video generation quality, while also excelling in extended tasks.
arXiv Detail & Related papers (2024-10-04T05:47:39Z) - Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation [134.22372190926362]
Image diffusion distillation achieves high-fidelity generation with very few sampling steps.
Applying these techniques directly to video diffusion often results in unsatisfactory frame quality due to limited visual quality in public video datasets.
Our study aims to improve video diffusion distillation while improving frame appearance using abundant high-quality image data.
arXiv Detail & Related papers (2024-06-11T02:09:46Z) - Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large
Datasets [36.95521842177614]
We present Stable Video Diffusion - a latent video diffusion model for high-resolution, state-of-the-art text-to-video and image-to-video generation.
We identify and evaluate three different stages for successful training of video LDMs: text-to-image pretraining, video pretraining, and high-quality video finetuning.
arXiv Detail & Related papers (2023-11-25T22:28:38Z) - GD-VDM: Generated Depth for better Diffusion-based Video Generation [18.039417502897486]
This paper proposes GD-VDM, a novel diffusion model for video generation, demonstrating promising results.
We evaluated GD-VDM on the Cityscapes dataset and found that it generates more diverse and complex scenes compared to natural baselines.
arXiv Detail & Related papers (2023-06-19T21:32:10Z) - Motion-Conditioned Diffusion Model for Controllable Video Synthesis [75.367816656045]
We introduce MCDiff, a conditional diffusion model that generates a video from a starting image frame and a set of strokes.
We show that MCDiff achieves the state-the-art visual quality in stroke-guided controllable video synthesis.
arXiv Detail & Related papers (2023-04-27T17:59:32Z) - Video Probabilistic Diffusion Models in Projected Latent Space [75.4253202574722]
We propose a novel generative model for videos, coined projected latent video diffusion models (PVDM)
PVDM learns a video distribution in a low-dimensional latent space and thus can be efficiently trained with high-resolution videos under limited resources.
arXiv Detail & Related papers (2023-02-15T14:22:34Z) - Imagen Video: High Definition Video Generation with Diffusion Models [64.06483414521222]
Imagen Video is a text-conditional video generation system based on a cascade of video diffusion models.
We find Imagen Video capable of generating videos of high fidelity, but also having a high degree of controllability and world knowledge.
arXiv Detail & Related papers (2022-10-05T14:41:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.