GenDeF: Learning Generative Deformation Field for Video Generation
- URL: http://arxiv.org/abs/2312.04561v1
- Date: Thu, 7 Dec 2023 18:59:41 GMT
- Title: GenDeF: Learning Generative Deformation Field for Video Generation
- Authors: Wen Wang, Kecheng Zheng, Qiuyu Wang, Hao Chen, Zifan Shi, Ceyuan Yang,
Yujun Shen, Chunhua Shen
- Abstract summary: We propose to render a video by warping one static image with a generative deformation field (GenDeF)
Such a pipeline enjoys three appealing advantages.
- Score: 89.49567113452396
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We offer a new perspective on approaching the task of video generation.
Instead of directly synthesizing a sequence of frames, we propose to render a
video by warping one static image with a generative deformation field (GenDeF).
Such a pipeline enjoys three appealing advantages. First, we can sufficiently
reuse a well-trained image generator to synthesize the static image (also
called canonical image), alleviating the difficulty in producing a video and
thereby resulting in better visual quality. Second, we can easily convert a
deformation field to optical flows, making it possible to apply explicit
structural regularizations for motion modeling, leading to temporally
consistent results. Third, the disentanglement between content and motion
allows users to process a synthesized video through processing its
corresponding static image without any tuning, facilitating many applications
like video editing, keypoint tracking, and video segmentation. Both qualitative
and quantitative results on three common video generation benchmarks
demonstrate the superiority of our GenDeF method.
Related papers
- MotionAura: Generating High-Quality and Motion Consistent Videos using Discrete Diffusion [3.7270979204213446]
We present four key contributions to address the challenges of video processing.
First, we introduce the 3D Inverted Vector-Quantization Variencoenco Autocoder.
Second, we present MotionAura, a text-to-video generation framework.
Third, we propose a spectral transformer-based denoising network.
Fourth, we introduce a downstream task of Sketch Guided Videopainting.
arXiv Detail & Related papers (2024-10-10T07:07:56Z) - Flex3D: Feed-Forward 3D Generation With Flexible Reconstruction Model And Input View Curation [61.040832373015014]
We propose Flex3D, a novel framework for generating high-quality 3D content from text, single images, or sparse view images.
We employ a fine-tuned multi-view image diffusion model and a video diffusion model to generate a pool of candidate views, enabling a rich representation of the target 3D object.
In the second stage, the curated views are fed into a Flexible Reconstruction Model (FlexRM), built upon a transformer architecture that can effectively process an arbitrary number of inputs.
arXiv Detail & Related papers (2024-10-01T17:29:43Z) - EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture [11.587428534308945]
EasyAnimate is an advanced method for video generation that leverages the power of transformer architecture for high-performance outcomes.
We have expanded the DiT framework originally designed for 2D image synthesis to accommodate the complexities of 3D video generation by incorporating a motion module block.
We provide a holistic ecosystem for video production based on DiT, encompassing aspects such as data pre-processing, VAE training, DiT models training, and end-to-end video inference.
arXiv Detail & Related papers (2024-05-29T11:11:07Z) - CoDeF: Content Deformation Fields for Temporally Consistent Video
Processing [89.49585127724941]
CoDeF is a new type of video representation, which consists of a canonical content field and a temporal deformation field.
We experimentally show that CoDeF is able to lift image-to-image translation to video-to-video translation and lift keypoint detection to keypoint tracking without any training.
arXiv Detail & Related papers (2023-08-15T17:59:56Z) - InstructVid2Vid: Controllable Video Editing with Natural Language Instructions [97.17047888215284]
InstructVid2Vid is an end-to-end diffusion-based methodology for video editing guided by human language instructions.
Our approach empowers video manipulation guided by natural language directives, eliminating the need for per-example fine-tuning or inversion.
arXiv Detail & Related papers (2023-05-21T03:28:13Z) - Towards Smooth Video Composition [59.134911550142455]
Video generation requires consistent and persistent frames with dynamic content over time.
This work investigates modeling the temporal relations for composing video with arbitrary length, from a few frames to even infinite, using generative adversarial networks (GANs)
We show that the alias-free operation for single image generation, together with adequately pre-learned knowledge, brings a smooth frame transition without compromising the per-frame quality.
arXiv Detail & Related papers (2022-12-14T18:54:13Z) - Encode-in-Style: Latent-based Video Encoding using StyleGAN2 [0.7614628596146599]
We propose an end-to-end facial video encoding approach that facilitates data-efficient high-quality video re-synthesis.
The approach builds on StyleGAN2 image inversion and multi-stage non-linear latent-space editing to generate videos that are nearly comparable to input videos.
arXiv Detail & Related papers (2022-03-28T05:44:19Z) - A Good Image Generator Is What You Need for High-Resolution Video
Synthesis [73.82857768949651]
We present a framework that leverages contemporary image generators to render high-resolution videos.
We frame the video synthesis problem as discovering a trajectory in the latent space of a pre-trained and fixed image generator.
We introduce a motion generator that discovers the desired trajectory, in which content and motion are disentangled.
arXiv Detail & Related papers (2021-04-30T15:38:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.