Encode-in-Style: Latent-based Video Encoding using StyleGAN2
- URL: http://arxiv.org/abs/2203.14512v1
- Date: Mon, 28 Mar 2022 05:44:19 GMT
- Title: Encode-in-Style: Latent-based Video Encoding using StyleGAN2
- Authors: Trevine Oorloff, Yaser Yacoob
- Abstract summary: We propose an end-to-end facial video encoding approach that facilitates data-efficient high-quality video re-synthesis.
The approach builds on StyleGAN2 image inversion and multi-stage non-linear latent-space editing to generate videos that are nearly comparable to input videos.
- Score: 0.7614628596146599
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose an end-to-end facial video encoding approach that facilitates
data-efficient high-quality video re-synthesis by optimizing low-dimensional
edits of a single Identity-latent. The approach builds on StyleGAN2 image
inversion and multi-stage non-linear latent-space editing to generate videos
that are nearly comparable to input videos. It economically captures face
identity, head-pose, and complex facial motions at fine levels, and thereby
bypasses training and person modeling which tend to hamper many re-synthesis
approaches. The approach is designed with maximum data efficiency, where a
single W+ latent and 35 parameters per frame enable high-fidelity video
rendering. This pipeline can also be used for puppeteering (i.e., motion
transfer).
Related papers
- VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping [43.30061680192465]
We present the first diffusion-based framework specifically designed for video face swapping.
Our approach incorporates a specially designed diffusion model coupled with a VidFaceVAE.
Our framework achieves superior performance in identity preservation, temporal consistency, and visual quality compared to existing methods.
arXiv Detail & Related papers (2024-12-15T18:58:32Z) - Optical-Flow Guided Prompt Optimization for Coherent Video Generation [51.430833518070145]
We propose a framework called MotionPrompt that guides the video generation process via optical flow.
We optimize learnable token embeddings during reverse sampling steps by using gradients from a trained discriminator applied to random frame pairs.
This approach allows our method to generate visually coherent video sequences that closely reflect natural motion dynamics, without compromising the fidelity of the generated content.
arXiv Detail & Related papers (2024-11-23T12:26:52Z) - Depth Any Video with Scalable Synthetic Data [98.42356740981839]
We develop a scalable synthetic data pipeline, capturing real-time video depth data from diverse synthetic environments.
We leverage the powerful priors of generative video diffusion models to handle real-world videos effectively.
Our model outperforms all previous generative depth models in terms of spatial accuracy and temporal consistency.
arXiv Detail & Related papers (2024-10-14T17:59:46Z) - Generative Human Video Compression with Multi-granularity Temporal Trajectory Factorization [13.341123726068652]
We propose a novel Multi-granularity Temporal Trajectory Factorization framework for generative human video compression.
Experimental results show that proposed method outperforms latest generative models and the state-of-the-art video coding standard Versatile Video Coding.
arXiv Detail & Related papers (2024-10-14T05:34:32Z) - MotionAura: Generating High-Quality and Motion Consistent Videos using Discrete Diffusion [3.7270979204213446]
We present four key contributions to address the challenges of video processing.
First, we introduce the 3D Inverted Vector-Quantization Variencoenco Autocoder.
Second, we present MotionAura, a text-to-video generation framework.
Third, we propose a spectral transformer-based denoising network.
Fourth, we introduce a downstream task of Sketch Guided Videopainting.
arXiv Detail & Related papers (2024-10-10T07:07:56Z) - When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding [118.72266141321647]
Cross-Modality Video Coding (CMVC) is a pioneering approach to explore multimodality representation and video generative models in video coding.
During decoding, previously encoded components and video generation models are leveraged to create multiple encoding-decoding modes.
Experiments indicate that TT2V achieves effective semantic reconstruction, while IT2V exhibits competitive perceptual consistency.
arXiv Detail & Related papers (2024-08-15T11:36:18Z) - Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization [52.63845811751936]
Video pre-training is challenging due to the modeling of its dynamics video.
In this paper, we address such limitations in video pre-training with an efficient video decomposition.
Our framework is both capable of comprehending and generating image and video content, as demonstrated by its performance across 13 multimodal benchmarks.
arXiv Detail & Related papers (2024-02-05T16:30:49Z) - RAVEN: Rethinking Adversarial Video Generation with Efficient Tri-plane Networks [93.18404922542702]
We present a novel video generative model designed to address long-term spatial and temporal dependencies.
Our approach incorporates a hybrid explicit-implicit tri-plane representation inspired by 3D-aware generative frameworks.
Our model synthesizes high-fidelity video clips at a resolution of $256times256$ pixels, with durations extending to more than $5$ seconds at a frame rate of 30 fps.
arXiv Detail & Related papers (2024-01-11T16:48:44Z) - GenDeF: Learning Generative Deformation Field for Video Generation [89.49567113452396]
We propose to render a video by warping one static image with a generative deformation field (GenDeF)
Such a pipeline enjoys three appealing advantages.
arXiv Detail & Related papers (2023-12-07T18:59:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.