Diverse Video Generation from a Single Video
- URL: http://arxiv.org/abs/2205.05725v1
- Date: Wed, 11 May 2022 18:36:48 GMT
- Title: Diverse Video Generation from a Single Video
- Authors: Niv Haim, Ben Feinstein, Niv Granot, Assaf Shocher, Shai Bagon, Tali
Dekel, Michal Irani
- Abstract summary: GANs are able to perform generation and manipulation tasks, trained on a single video.
In this paper we question the necessity of a GAN for generation from a single video.
We introduce a non-parametric baseline for a variety of generation and manipulation tasks.
- Score: 19.973264262422273
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: GANs are able to perform generation and manipulation tasks, trained on a
single video. However, these single video GANs require unreasonable amount of
time to train on a single video, rendering them almost impractical. In this
paper we question the necessity of a GAN for generation from a single video,
and introduce a non-parametric baseline for a variety of generation and
manipulation tasks. We revive classical space-time patches-nearest-neighbors
approaches and adapt them to a scalable unconditional generative model, without
any learning. This simple baseline surprisingly outperforms single-video GANs
in visual quality and realism (confirmed by quantitative and qualitative
evaluations), and is disproportionately faster (runtime reduced from several
days to seconds). Our approach is easily scaled to Full-HD videos. We also use
the same framework to demonstrate video analogies and spatio-temporal
retargeting. These observations show that classical approaches significantly
outperform heavy deep learning machinery for these tasks. This sets a new
baseline for single-video generation and manipulation tasks, and no less
important -- makes diverse generation from a single video practically possible
for the first time.
Related papers
- Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video
Synthesis [69.83405335645305]
We argue that naively bringing advances of image models to the video generation domain reduces motion fidelity, visual quality and impairs scalability.
In this work, we build Snap Video, a video-first model that systematically addresses these challenges.
We show that a U-Net - a workhorse behind image generation - scales poorly when generating videos, requiring significant computational overhead.
This allows us to efficiently train a text-to-video model with billions of parameters for the first time, reach state-of-the-art results on a number of benchmarks, and generate videos with substantially higher quality, temporal consistency, and motion complexity.
arXiv Detail & Related papers (2024-02-22T18:55:08Z) - VideoCutLER: Surprisingly Simple Unsupervised Video Instance
Segmentation [87.13210748484217]
VideoCutLER is a simple method for unsupervised multi-instance video segmentation without using motion-based learning signals like optical flow or training on natural videos.
We show the first competitive unsupervised learning results on the challenging YouTubeVIS 2019 benchmark, achieving 50.7% APvideo50.
VideoCutLER can also serve as a strong pretrained model for supervised video instance segmentation tasks, exceeding DINO by 15.9% on YouTubeVIS 2019 in terms of APvideo.
arXiv Detail & Related papers (2023-08-28T17:10:12Z) - Multi-object Video Generation from Single Frame Layouts [84.55806837855846]
We propose a video generative framework capable of synthesizing global scenes with local objects.
Our framework is a non-trivial adaptation from image generation methods, and is new to this field.
Our model has been evaluated on two widely-used video recognition benchmarks.
arXiv Detail & Related papers (2023-05-06T09:07:01Z) - Video Generation Beyond a Single Clip [76.5306434379088]
Video generation models can only generate video clips that are relatively short compared with the length of real videos.
To generate long videos covering diverse content and multiple events, we propose to use additional guidance to control the video generation process.
The proposed approach is complementary to existing efforts on video generation, which focus on generating realistic video within a fixed time window.
arXiv Detail & Related papers (2023-04-15T06:17:30Z) - Revealing Single Frame Bias for Video-and-Language Learning [115.01000652123882]
We show that a single-frame trained model can achieve better performance than existing methods that use multiple frames for training.
This result reveals the existence of a strong "static appearance bias" in popular video-and-language datasets.
We propose two new retrieval tasks based on existing fine-grained action recognition datasets that encourage temporal modeling.
arXiv Detail & Related papers (2022-06-07T16:28:30Z) - Diverse Generation from a Single Video Made Possible [24.39972895902724]
We present a fast and practical method for video generation and manipulation from a single natural video.
Our method generates more realistic and higher quality results than single-video GANs.
arXiv Detail & Related papers (2021-09-17T15:12:17Z) - Non-Adversarial Video Synthesis with Learned Priors [53.26777815740381]
We focus on the problem of generating videos from latent noise vectors, without any reference input frames.
We develop a novel approach that jointly optimize the input latent space, the weights of a recurrent neural network and a generator through non-adversarial learning.
Our approach generates superior quality videos compared to the existing state-of-the-art methods.
arXiv Detail & Related papers (2020-03-21T02:57:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.