Cascaded Video Generation for Videos In-the-Wild
- URL: http://arxiv.org/abs/2206.00735v1
- Date: Wed, 1 Jun 2022 19:50:50 GMT
- Title: Cascaded Video Generation for Videos In-the-Wild
- Authors: Lluis Castrejon, Nicolas Ballas, Aaron Courville
- Abstract summary: We propose a cascaded model for video generation which follows a coarse to fine approach.
First our model generates a low resolution video, establishing the global scene structure.
We train each cascade level sequentially on partial views of the videos, which reduces the computational complexity.
- Score: 10.017846915566174
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Videos can be created by first outlining a global view of the scene and then
adding local details. Inspired by this idea we propose a cascaded model for
video generation which follows a coarse to fine approach. First our model
generates a low resolution video, establishing the global scene structure,
which is then refined by subsequent cascade levels operating at larger
resolutions. We train each cascade level sequentially on partial views of the
videos, which reduces the computational complexity of our model and makes it
scalable to high-resolution videos with many frames. We empirically validate
our approach on UCF101 and Kinetics-600, for which our model is competitive
with the state-of-the-art. We further demonstrate the scaling capabilities of
our model and train a three-level model on the BDD100K dataset which generates
256x256 pixels videos with 48 frames.
Related papers
- Hierarchical Patch Diffusion Models for High-Resolution Video Generation [50.42746357450949]
We develop deep context fusion, which propagates context information from low-scale to high-scale patches in a hierarchical manner.
We also propose adaptive computation, which allocates more network capacity and computation towards coarse image details.
The resulting model sets a new state-of-the-art FVD score of 66.32 and Inception Score of 87.68 in class-conditional video generation.
arXiv Detail & Related papers (2024-06-12T01:12:53Z) - ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation [81.90265212988844]
We propose a training-free video method for generative video models in a plug-and-play manner.
We transform a video model into a self-cascaded video diffusion model with the designed hidden state correction modules.
Our training-free method is even comparable to trained models supported by huge compute resources and large-scale datasets.
arXiv Detail & Related papers (2024-06-03T00:31:13Z) - Photorealistic Video Generation with Diffusion Models [44.95407324724976]
W.A.L.T. is a transformer-based approach for video generation via diffusion modeling.
We use a causal encoder to jointly compress images and videos within a unified latent space, enabling training and generation across modalities.
We also train a cascade of three models for the task of text-to-video generation consisting of a base latent video diffusion model, and two video super-resolution diffusion models to generate videos of $512 times $ resolution at $8$ frames per second.
arXiv Detail & Related papers (2023-12-11T18:59:57Z) - Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large
Datasets [36.95521842177614]
We present Stable Video Diffusion - a latent video diffusion model for high-resolution, state-of-the-art text-to-video and image-to-video generation.
We identify and evaluate three different stages for successful training of video LDMs: text-to-image pretraining, video pretraining, and high-quality video finetuning.
arXiv Detail & Related papers (2023-11-25T22:28:38Z) - Multi-object Video Generation from Single Frame Layouts [84.55806837855846]
We propose a video generative framework capable of synthesizing global scenes with local objects.
Our framework is a non-trivial adaptation from image generation methods, and is new to this field.
Our model has been evaluated on two widely-used video recognition benchmarks.
arXiv Detail & Related papers (2023-05-06T09:07:01Z) - InternVideo: General Video Foundation Models via Generative and
Discriminative Learning [52.69422763715118]
We present general video foundation models, InternVideo, for dynamic and complex video-level understanding tasks.
InternVideo efficiently explores masked video modeling and video-language contrastive learning as the pretraining objectives.
InternVideo achieves state-of-the-art performance on 39 video datasets from extensive tasks including video action recognition/detection, video-language alignment, and open-world video applications.
arXiv Detail & Related papers (2022-12-06T18:09:49Z) - MagicVideo: Efficient Video Generation With Latent Diffusion Models [76.95903791630624]
We present an efficient text-to-video generation framework based on latent diffusion models, termed MagicVideo.
Due to a novel and efficient 3D U-Net design and modeling video distributions in a low-dimensional space, MagicVideo can synthesize video clips with 256x256 spatial resolution on a single GPU card.
We conduct extensive experiments and demonstrate that MagicVideo can generate high-quality video clips with either realistic or imaginary content.
arXiv Detail & Related papers (2022-11-20T16:40:31Z) - Imagen Video: High Definition Video Generation with Diffusion Models [64.06483414521222]
Imagen Video is a text-conditional video generation system based on a cascade of video diffusion models.
We find Imagen Video capable of generating videos of high fidelity, but also having a high degree of controllability and world knowledge.
arXiv Detail & Related papers (2022-10-05T14:41:38Z) - Hierarchical Video Generation for Complex Data [14.901308948331321]
We propose a hierarchical model for video generation which follows a coarse to fine approach.
First our model generates a low resolution video, establishing the global scene structure, that is then refined by subsequent levels in the hierarchy.
We validate our approach on Kinetics-600 and BDD100K, for which we train a three level model capable of generating 256x256 videos with 48 frames.
arXiv Detail & Related papers (2021-06-04T21:03:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.