Video Diffusion Models
- URL: http://arxiv.org/abs/2204.03458v1
- Date: Thu, 7 Apr 2022 14:08:02 GMT
- Title: Video Diffusion Models
- Authors: Jonathan Ho, Tim Salimans, Alexey Gritsenko, William Chan, Mohammad
Norouzi, David J. Fleet
- Abstract summary: Generating temporally coherent high fidelity video is an important milestone in generative modeling research.
We propose a diffusion model for video generation that shows very promising initial results.
We present the first results on a large text-conditioned video generation task, as well as state-of-the-art results on an established unconditional video generation benchmark.
- Score: 47.99413440461512
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generating temporally coherent high fidelity video is an important milestone
in generative modeling research. We make progress towards this milestone by
proposing a diffusion model for video generation that shows very promising
initial results. Our model is a natural extension of the standard image
diffusion architecture, and it enables jointly training from image and video
data, which we find to reduce the variance of minibatch gradients and speed up
optimization. To generate long and higher resolution videos we introduce a new
conditional sampling technique for spatial and temporal video extension that
performs better than previously proposed methods. We present the first results
on a large text-conditioned video generation task, as well as state-of-the-art
results on an established unconditional video generation benchmark.
Supplementary material is available at https://video-diffusion.github.io/
Related papers
- DreamVideo: High-Fidelity Image-to-Video Generation with Image Retention and Text Guidance [69.0740091741732]
We propose a high-fidelity image-to-video generation method by devising a frame retention branch based on a pre-trained video diffusion model, named DreamVideo.
Our model has a powerful image retention ability and delivers the best results in UCF101 compared to other image-to-video models to our best knowledge.
arXiv Detail & Related papers (2023-12-05T03:16:31Z) - VideoGen: A Reference-Guided Latent Diffusion Approach for High
Definition Text-to-Video Generation [73.54366331493007]
VideoGen is a text-to-video generation approach, which can generate a high-definition video with high frame fidelity and strong temporal consistency.
We leverage an off-the-shelf text-to-image generation model, e.g., Stable Diffusion, to generate an image with high content quality from the text prompt.
arXiv Detail & Related papers (2023-09-01T11:14:43Z) - Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models [52.93036326078229]
Off-the-shelf billion-scale datasets for image generation are available, but collecting similar video data of the same scale is still challenging.
In this work, we explore finetuning a pretrained image diffusion model with video data as a practical solution for the video synthesis task.
Our model, Preserve Your Own Correlation (PYoCo), attains SOTA zero-shot text-to-video results on the UCF-101 and MSR-VTT benchmarks.
arXiv Detail & Related papers (2023-05-17T17:59:16Z) - Latent Video Diffusion Models for High-Fidelity Long Video Generation [58.346702410885236]
We introduce lightweight video diffusion models using a low-dimensional 3D latent space.
We also propose hierarchical diffusion in the latent space such that longer videos with more than one thousand frames can be produced.
Our framework generates more realistic and longer videos than previous strong baselines.
arXiv Detail & Related papers (2022-11-23T18:58:39Z) - Imagen Video: High Definition Video Generation with Diffusion Models [64.06483414521222]
Imagen Video is a text-conditional video generation system based on a cascade of video diffusion models.
We find Imagen Video capable of generating videos of high fidelity, but also having a high degree of controllability and world knowledge.
arXiv Detail & Related papers (2022-10-05T14:41:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.