VIDM: Video Implicit Diffusion Models
- URL: http://arxiv.org/abs/2212.00235v1
- Date: Thu, 1 Dec 2022 02:58:46 GMT
- Title: VIDM: Video Implicit Diffusion Models
- Authors: Kangfu Mei and Vishal M. Patel
- Abstract summary: Diffusion models have emerged as a powerful generative method for synthesizing high-quality and diverse set of images.
We propose a video generation method based on diffusion models, where the effects of motion are modeled in an implicit condition.
We improve the quality of the generated videos by proposing multiple strategies such as sampling space truncation, robustness penalty, and positional group normalization.
- Score: 75.90225524502759
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion models have emerged as a powerful generative method for
synthesizing high-quality and diverse set of images. In this paper, we propose
a video generation method based on diffusion models, where the effects of
motion are modeled in an implicit condition manner, i.e. one can sample
plausible video motions according to the latent feature of frames. We improve
the quality of the generated videos by proposing multiple strategies such as
sampling space truncation, robustness penalty, and positional group
normalization. Various experiments are conducted on datasets consisting of
videos with different resolutions and different number of frames. Results show
that the proposed method outperforms the state-of-the-art generative
adversarial network-based methods by a significant margin in terms of FVD
scores as well as perceptible visual quality.
Related papers
- Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep Approach [29.753974393652356]
We propose a frame-aware video diffusion model(FVDM)
Our approach allows each frame to follow an independent noise schedule, enhancing the model's capacity to capture fine-grained temporal dependencies.
Our empirical evaluations show that FVDM outperforms state-of-the-art methods in video generation quality, while also excelling in extended tasks.
arXiv Detail & Related papers (2024-10-04T05:47:39Z) - JVID: Joint Video-Image Diffusion for Visual-Quality and Temporal-Consistency in Video Generation [6.463753697299011]
We introduce the Joint Video-Image Diffusion model (JVID), a novel approach to generating high-quality temporally coherent videos.
Our results demonstrate quantitative and qualitative improvements in producing realistic and coherent videos.
arXiv Detail & Related papers (2024-09-21T13:59:50Z) - Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation [134.22372190926362]
Image diffusion distillation achieves high-fidelity generation with very few sampling steps.
Applying these techniques directly to video diffusion often results in unsatisfactory frame quality due to limited visual quality in public video datasets.
Our study aims to improve video diffusion distillation while improving frame appearance using abundant high-quality image data.
arXiv Detail & Related papers (2024-06-11T02:09:46Z) - Video Interpolation with Diffusion Models [54.06746595879689]
We present VIDIM, a generative model for video, which creates short videos given a start and end frame.
VIDIM uses cascaded diffusion models to first generate the target video at low resolution, and then generate the high-resolution video conditioned on the low-resolution generated video.
arXiv Detail & Related papers (2024-04-01T15:59:32Z) - Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large
Datasets [36.95521842177614]
We present Stable Video Diffusion - a latent video diffusion model for high-resolution, state-of-the-art text-to-video and image-to-video generation.
We identify and evaluate three different stages for successful training of video LDMs: text-to-image pretraining, video pretraining, and high-quality video finetuning.
arXiv Detail & Related papers (2023-11-25T22:28:38Z) - GD-VDM: Generated Depth for better Diffusion-based Video Generation [18.039417502897486]
This paper proposes GD-VDM, a novel diffusion model for video generation, demonstrating promising results.
We evaluated GD-VDM on the Cityscapes dataset and found that it generates more diverse and complex scenes compared to natural baselines.
arXiv Detail & Related papers (2023-06-19T21:32:10Z) - Motion-Conditioned Diffusion Model for Controllable Video Synthesis [75.367816656045]
We introduce MCDiff, a conditional diffusion model that generates a video from a starting image frame and a set of strokes.
We show that MCDiff achieves the state-the-art visual quality in stroke-guided controllable video synthesis.
arXiv Detail & Related papers (2023-04-27T17:59:32Z) - Video Probabilistic Diffusion Models in Projected Latent Space [75.4253202574722]
We propose a novel generative model for videos, coined projected latent video diffusion models (PVDM)
PVDM learns a video distribution in a low-dimensional latent space and thus can be efficiently trained with high-resolution videos under limited resources.
arXiv Detail & Related papers (2023-02-15T14:22:34Z) - Imagen Video: High Definition Video Generation with Diffusion Models [64.06483414521222]
Imagen Video is a text-conditional video generation system based on a cascade of video diffusion models.
We find Imagen Video capable of generating videos of high fidelity, but also having a high degree of controllability and world knowledge.
arXiv Detail & Related papers (2022-10-05T14:41:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.