Diverse Generation from a Single Video Made Possible
- URL: http://arxiv.org/abs/2109.08591v1
- Date: Fri, 17 Sep 2021 15:12:17 GMT
- Title: Diverse Generation from a Single Video Made Possible
- Authors: Niv Haim, Ben Feinstein, Niv Granot, Assaf Shocher, Shai Bagon, Tali
Dekel, Michal Irani
- Abstract summary: We present a fast and practical method for video generation and manipulation from a single natural video.
Our method generates more realistic and higher quality results than single-video GANs.
- Score: 24.39972895902724
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Most advanced video generation and manipulation methods train on a large
collection of videos. As such, they are restricted to the types of video
dynamics they train on. To overcome this limitation, GANs trained on a single
video were recently proposed. While these provide more flexibility to a wide
variety of video dynamics, they require days to train on a single tiny input
video, rendering them impractical. In this paper we present a fast and
practical method for video generation and manipulation from a single natural
video, which generates diverse high-quality video outputs within seconds (for
benchmark videos). Our method can be further applied to Full-HD video clips
within minutes. Our approach is inspired by a recent advanced
patch-nearest-neighbor based approach [Granot et al. 2021], which was shown to
significantly outperform single-image GANs, both in run-time and in visual
quality. Here we generalize this approach from images to videos, by casting
classical space-time patch-based methods as a new generative video model. We
adapt the generative image patch nearest neighbor approach to efficiently cope
with the huge number of space-time patches in a single video. Our method
generates more realistic and higher quality results than single-video GANs
(confirmed by quantitative and qualitative evaluations). Moreover, it is
disproportionally faster (runtime reduced from several days to seconds). Other
than diverse video generation, we demonstrate several other challenging video
applications, including spatio-temporal video retargeting, video structural
analogies and conditional video-inpainting.
Related papers
- ExVideo: Extending Video Diffusion Models via Parameter-Efficient Post-Tuning [36.378348127629195]
We propose a novel post-tuning methodology for video synthesis models, called ExVideo.
This approach is designed to enhance the capability of current video synthesis models, allowing them to produce content over extended temporal durations.
Our approach augments the model's capacity to generate up to $5times$ its original number of frames, requiring only 1.5k GPU hours of training on a dataset comprising 40k videos.
arXiv Detail & Related papers (2024-06-20T09:18:54Z) - Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video
Synthesis [69.83405335645305]
We argue that naively bringing advances of image models to the video generation domain reduces motion fidelity, visual quality and impairs scalability.
In this work, we build Snap Video, a video-first model that systematically addresses these challenges.
We show that a U-Net - a workhorse behind image generation - scales poorly when generating videos, requiring significant computational overhead.
This allows us to efficiently train a text-to-video model with billions of parameters for the first time, reach state-of-the-art results on a number of benchmarks, and generate videos with substantially higher quality, temporal consistency, and motion complexity.
arXiv Detail & Related papers (2024-02-22T18:55:08Z) - SEINE: Short-to-Long Video Diffusion Model for Generative Transition and
Prediction [93.26613503521664]
This paper presents a short-to-long video diffusion model, SEINE, that focuses on generative transition and prediction.
We propose a random-mask video diffusion model to automatically generate transitions based on textual descriptions.
Our model generates transition videos that ensure coherence and visual quality.
arXiv Detail & Related papers (2023-10-31T17:58:17Z) - WAIT: Feature Warping for Animation to Illustration video Translation
using GANs [12.681919619814419]
We introduce a new problem for video stylizing where an unordered set of images are used.
Most of the video-to-video translation methods are built on an image-to-image translation model.
We propose a new generator network with feature warping layers which overcomes the limitations of the previous methods.
arXiv Detail & Related papers (2023-10-07T19:45:24Z) - Video Generation Beyond a Single Clip [76.5306434379088]
Video generation models can only generate video clips that are relatively short compared with the length of real videos.
To generate long videos covering diverse content and multiple events, we propose to use additional guidance to control the video generation process.
The proposed approach is complementary to existing efforts on video generation, which focus on generating realistic video within a fixed time window.
arXiv Detail & Related papers (2023-04-15T06:17:30Z) - MagicVideo: Efficient Video Generation With Latent Diffusion Models [76.95903791630624]
We present an efficient text-to-video generation framework based on latent diffusion models, termed MagicVideo.
Due to a novel and efficient 3D U-Net design and modeling video distributions in a low-dimensional space, MagicVideo can synthesize video clips with 256x256 spatial resolution on a single GPU card.
We conduct extensive experiments and demonstrate that MagicVideo can generate high-quality video clips with either realistic or imaginary content.
arXiv Detail & Related papers (2022-11-20T16:40:31Z) - Diverse Video Generation from a Single Video [19.973264262422273]
GANs are able to perform generation and manipulation tasks, trained on a single video.
In this paper we question the necessity of a GAN for generation from a single video.
We introduce a non-parametric baseline for a variety of generation and manipulation tasks.
arXiv Detail & Related papers (2022-05-11T18:36:48Z) - Video Diffusion Models [47.99413440461512]
Generating temporally coherent high fidelity video is an important milestone in generative modeling research.
We propose a diffusion model for video generation that shows very promising initial results.
We present the first results on a large text-conditioned video generation task, as well as state-of-the-art results on an established unconditional video generation benchmark.
arXiv Detail & Related papers (2022-04-07T14:08:02Z) - Deep Video Prior for Video Consistency and Propagation [58.250209011891904]
We present a novel and general approach for blind video temporal consistency.
Our method is only trained on a pair of original and processed videos directly instead of a large dataset.
We show that temporal consistency can be achieved by training a convolutional neural network on a video with Deep Video Prior.
arXiv Detail & Related papers (2022-01-27T16:38:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.