Related papers: A Good Image Generator Is What You Need for High-Resolution Video Synthesis

A Good Image Generator Is What You Need for High-Resolution Video Synthesis

URL: http://arxiv.org/abs/2104.15069v1
Date: Fri, 30 Apr 2021 15:38:41 GMT
Title: A Good Image Generator Is What You Need for High-Resolution Video Synthesis
Authors: Yu Tian, Jian Ren, Menglei Chai, Kyle Olszewski, Xi Peng, Dimitris N. Metaxas, Sergey Tulyakov
Abstract summary: We present a framework that leverages contemporary image generators to render high-resolution videos. We frame the video synthesis problem as discovering a trajectory in the latent space of a pre-trained and fixed image generator. We introduce a motion generator that discovers the desired trajectory, in which content and motion are disentangled.
Score: 73.82857768949651
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Image and video synthesis are closely related areas aiming at generating content from noise. While rapid progress has been demonstrated in improving image-based models to handle large resolutions, high-quality renderings, and wide variations in image content, achieving comparable video generation results remains problematic. We present a framework that leverages contemporary image generators to render high-resolution videos. We frame the video synthesis problem as discovering a trajectory in the latent space of a pre-trained and fixed image generator. Not only does such a framework render high-resolution videos, but it also is an order of magnitude more computationally efficient. We introduce a motion generator that discovers the desired trajectory, in which content and motion are disentangled. With such a representation, our framework allows for a broad range of applications, including content and motion manipulation. Furthermore, we introduce a new task, which we call cross-domain video synthesis, in which the image and motion generators are trained on disjoint datasets belonging to different domains. This allows for generating moving objects for which the desired video data is not available. Extensive experiments on various datasets demonstrate the advantages of our methods over existing video generation techniques. Code will be released at https://github.com/snap-research/MoCoGAN-HD.

Related papers

MotionAura: Generating High-Quality and Motion Consistent Videos using Discrete Diffusion [3.7270979204213446]
We present four key contributions to address the challenges of video processing. First, we introduce the 3D Inverted Vector-Quantization Variencoenco Autocoder. Second, we present MotionAura, a text-to-video generation framework. Third, we propose a spectral transformer-based denoising network. Fourth, we introduce a downstream task of Sketch Guided Videopainting.
arXiv Detail & Related papers (2024-10-10T07:07:56Z)
GenDeF: Learning Generative Deformation Field for Video Generation [89.49567113452396]
We propose to render a video by warping one static image with a generative deformation field (GenDeF) Such a pipeline enjoys three appealing advantages.
arXiv Detail & Related papers (2023-12-07T18:59:41Z)
Multi-object Video Generation from Single Frame Layouts [84.55806837855846]
We propose a video generative framework capable of synthesizing global scenes with local objects. Our framework is a non-trivial adaptation from image generation methods, and is new to this field. Our model has been evaluated on two widely-used video recognition benchmarks.
arXiv Detail & Related papers (2023-05-06T09:07:01Z)
LaMD: Latent Motion Diffusion for Video Generation [69.4111397077229]
latent motion diffusion (LaMD) framework consists of a motion-decomposed video autoencoder and a diffusion-based motion generator. Results show that LaMD generates high-quality videos with a wide range of motions, from dynamics to highly controllable movements.
arXiv Detail & Related papers (2023-04-23T10:32:32Z)
Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators [70.17041424896507]
Recent text-to-video generation approaches rely on computationally heavy training and require large-scale video datasets. We propose a new task of zero-shot text-to-video generation using existing text-to-image synthesis methods. Our method performs comparably or sometimes better than recent approaches, despite not being trained on additional video data.
arXiv Detail & Related papers (2023-03-23T17:01:59Z)
MotionVideoGAN: A Novel Video Generator Based on the Motion Space Learned from Image Pairs [16.964371778504297]
We present MotionVideoGAN, a novel video generator synthesizing videos based on the motion space learned by pre-trained image pair generators. Motion codes help us edit images within the motion space since the edited image shares the same contents with the other unchanged one in image pairs. Our approach achieves state-of-the-art performance on the most complex video dataset ever used for unconditional video generation evaluation, UCF101.
arXiv Detail & Related papers (2023-03-06T05:52:13Z)
Towards Smooth Video Composition [59.134911550142455]
Video generation requires consistent and persistent frames with dynamic content over time. This work investigates modeling the temporal relations for composing video with arbitrary length, from a few frames to even infinite, using generative adversarial networks (GANs) We show that the alias-free operation for single image generation, together with adequately pre-learned knowledge, brings a smooth frame transition without compromising the per-frame quality.
arXiv Detail & Related papers (2022-12-14T18:54:13Z)
Autoencoding Video Latents for Adversarial Video Generation [0.0]
AVLAE is a two stream latent autoencoder where the video distribution is learned by adversarial training. We demonstrate that our approach learns to disentangle motion and appearance codes even without the explicit structural composition in the generator.
arXiv Detail & Related papers (2022-01-18T11:42:14Z)
Render In-between: Motion Guided Video Synthesis for Action Interpolation [53.43607872972194]
We propose a motion-guided frame-upsampling framework that is capable of producing realistic human motion and appearance. A novel motion model is trained to inference the non-linear skeletal motion between frames by leveraging a large-scale motion-capture dataset. Our pipeline only requires low-frame-rate videos and unpaired human motion data but does not require high-frame-rate videos for training.
arXiv Detail & Related papers (2021-11-01T15:32:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.