StyleInV: A Temporal Style Modulated Inversion Network for Unconditional
Video Generation
- URL: http://arxiv.org/abs/2308.16909v1
- Date: Thu, 31 Aug 2023 17:59:33 GMT
- Title: StyleInV: A Temporal Style Modulated Inversion Network for Unconditional
Video Generation
- Authors: Yuhan Wang, Liming Jiang, Chen Change Loy
- Abstract summary: We introduce a novel motion generator design that uses a learning-based inversion network for GAN.
Our method supports style transfer with simple fine-tuning when the encoder is paired with a pretrained StyleGAN generator.
- Score: 73.54398908446906
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unconditional video generation is a challenging task that involves
synthesizing high-quality videos that are both coherent and of extended
duration. To address this challenge, researchers have used pretrained StyleGAN
image generators for high-quality frame synthesis and focused on motion
generator design. The motion generator is trained in an autoregressive manner
using heavy 3D convolutional discriminators to ensure motion coherence during
video generation. In this paper, we introduce a novel motion generator design
that uses a learning-based inversion network for GAN. The encoder in our method
captures rich and smooth priors from encoding images to latents, and given the
latent of an initially generated frame as guidance, our method can generate
smooth future latent by modulating the inversion encoder temporally. Our method
enjoys the advantage of sparse training and naturally constrains the generation
space of our motion generator with the inversion network guided by the initial
frame, eliminating the need for heavy discriminators. Moreover, our method
supports style transfer with simple fine-tuning when the encoder is paired with
a pretrained StyleGAN generator. Extensive experiments conducted on various
benchmarks demonstrate the superiority of our method in generating long and
high-resolution videos with decent single-frame quality and temporal
consistency.
Related papers
- MotionAura: Generating High-Quality and Motion Consistent Videos using Discrete Diffusion [3.7270979204213446]
We present four key contributions to address the challenges of video processing.
First, we introduce the 3D Inverted Vector-Quantization Variencoenco Autocoder.
Second, we present MotionAura, a text-to-video generation framework.
Third, we propose a spectral transformer-based denoising network.
Fourth, we introduce a downstream task of Sketch Guided Videopainting.
arXiv Detail & Related papers (2024-10-10T07:07:56Z) - RAVEN: Rethinking Adversarial Video Generation with Efficient Tri-plane Networks [93.18404922542702]
We present a novel video generative model designed to address long-term spatial and temporal dependencies.
Our approach incorporates a hybrid explicit-implicit tri-plane representation inspired by 3D-aware generative frameworks.
Our model synthesizes high-fidelity video clips at a resolution of $256times256$ pixels, with durations extending to more than $5$ seconds at a frame rate of 30 fps.
arXiv Detail & Related papers (2024-01-11T16:48:44Z) - Latent-Shift: Latent Diffusion with Temporal Shift for Efficient
Text-to-Video Generation [115.09597127418452]
Latent-Shift is an efficient text-to-video generation method based on a pretrained text-to-image generation model.
We show that Latent-Shift achieves comparable or better results while being significantly more efficient.
arXiv Detail & Related papers (2023-04-17T17:57:06Z) - Towards Smooth Video Composition [59.134911550142455]
Video generation requires consistent and persistent frames with dynamic content over time.
This work investigates modeling the temporal relations for composing video with arbitrary length, from a few frames to even infinite, using generative adversarial networks (GANs)
We show that the alias-free operation for single image generation, together with adequately pre-learned knowledge, brings a smooth frame transition without compromising the per-frame quality.
arXiv Detail & Related papers (2022-12-14T18:54:13Z) - Generating Videos with Dynamics-aware Implicit Generative Adversarial
Networks [68.93429034530077]
We propose dynamics-aware implicit generative adversarial network (DIGAN) for video generation.
We show that DIGAN can be trained on 128 frame videos of 128x128 resolution, 80 frames longer than the 48 frames of the previous state-of-the-art method.
arXiv Detail & Related papers (2022-02-21T23:24:01Z) - Feature-Style Encoder for Style-Based GAN Inversion [1.9116784879310027]
We propose a novel architecture for GAN inversion, which we call Feature-Style encoder.
Our model achieves accurate inversion of real images from the latent space of a pre-trained style-based GAN model.
Thanks to its encoder structure, the model allows fast and accurate image editing.
arXiv Detail & Related papers (2022-02-04T15:19:34Z) - Autoencoding Video Latents for Adversarial Video Generation [0.0]
AVLAE is a two stream latent autoencoder where the video distribution is learned by adversarial training.
We demonstrate that our approach learns to disentangle motion and appearance codes even without the explicit structural composition in the generator.
arXiv Detail & Related papers (2022-01-18T11:42:14Z) - AE-StyleGAN: Improved Training of Style-Based Auto-Encoders [21.51697087024866]
StyleGANs have shown impressive results on data generation and manipulation in recent years.
In this paper, we focus on style-based generators asking a scientific question: Does forcing such a generator to reconstruct real data lead to more disentangled latent space and make the inversion process from image to latent space easy?
We describe a new methodology to train a style-based autoencoder where the encoder and generator are optimized end-to-end.
arXiv Detail & Related papers (2021-10-17T04:25:51Z) - A Good Image Generator Is What You Need for High-Resolution Video
Synthesis [73.82857768949651]
We present a framework that leverages contemporary image generators to render high-resolution videos.
We frame the video synthesis problem as discovering a trajectory in the latent space of a pre-trained and fixed image generator.
We introduce a motion generator that discovers the desired trajectory, in which content and motion are disentangled.
arXiv Detail & Related papers (2021-04-30T15:38:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.