VToonify: Controllable High-Resolution Portrait Video Style Transfer
- URL: http://arxiv.org/abs/2209.11224v2
- Date: Fri, 23 Sep 2022 08:01:10 GMT
- Title: VToonify: Controllable High-Resolution Portrait Video Style Transfer
- Authors: Shuai Yang, Liming Jiang, Ziwei Liu, Chen Change Loy
- Abstract summary: We introduce a novel VToonify framework for controllable high-resolution portrait video style transfer.
We leverage the mid- and high-resolution layers of StyleGAN to render artistic portraits based on the multi-scale content features extracted by an encoder.
Our framework is compatible with existing StyleGAN-based image toonification models to extend them to video toonification, and inherits appealing features of these models for flexible style control on color and intensity.
- Score: 103.54337984566877
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generating high-quality artistic portrait videos is an important and
desirable task in computer graphics and vision. Although a series of successful
portrait image toonification models built upon the powerful StyleGAN have been
proposed, these image-oriented methods have obvious limitations when applied to
videos, such as the fixed frame size, the requirement of face alignment,
missing non-facial details and temporal inconsistency. In this work, we
investigate the challenging controllable high-resolution portrait video style
transfer by introducing a novel VToonify framework. Specifically, VToonify
leverages the mid- and high-resolution layers of StyleGAN to render
high-quality artistic portraits based on the multi-scale content features
extracted by an encoder to better preserve the frame details. The resulting
fully convolutional architecture accepts non-aligned faces in videos of
variable size as input, contributing to complete face regions with natural
motions in the output. Our framework is compatible with existing StyleGAN-based
image toonification models to extend them to video toonification, and inherits
appealing features of these models for flexible style control on color and
intensity. This work presents two instantiations of VToonify built upon Toonify
and DualStyleGAN for collection-based and exemplar-based portrait video style
transfer, respectively. Extensive experimental results demonstrate the
effectiveness of our proposed VToonify framework over existing methods in
generating high-quality and temporally-coherent artistic portrait videos with
flexible style controls.
Related papers
- WildVidFit: Video Virtual Try-On in the Wild via Image-Based Controlled Diffusion Models [132.77237314239025]
Video virtual try-on aims to generate realistic sequences that maintain garment identity and adapt to a person's pose and body shape in source videos.
Traditional image-based methods, relying on warping and blending, struggle with complex human movements and occlusions.
We reconceptualize video try-on as a process of generating videos conditioned on garment descriptions and human motion.
Our solution, WildVidFit, employs image-based controlled diffusion models for a streamlined, one-stage approach.
arXiv Detail & Related papers (2024-07-15T11:21:03Z) - VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation [79.99551055245071]
We propose VividPose, an end-to-end pipeline that ensures superior temporal stability.
An identity-aware appearance controller integrates additional facial information without compromising other appearance details.
A geometry-aware pose controller utilizes both dense rendering maps from SMPL-X and sparse skeleton maps.
VividPose exhibits superior generalization capabilities on our proposed in-the-wild dataset.
arXiv Detail & Related papers (2024-05-28T13:18:32Z) - Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation [93.18163456287164]
This paper proposes a novel text-guided video-to-video translation framework to adapt image models to videos.
Our framework achieves global style and local texture temporal consistency at a low cost.
arXiv Detail & Related papers (2023-06-13T17:52:23Z) - Imagen Video: High Definition Video Generation with Diffusion Models [64.06483414521222]
Imagen Video is a text-conditional video generation system based on a cascade of video diffusion models.
We find Imagen Video capable of generating videos of high fidelity, but also having a high degree of controllability and world knowledge.
arXiv Detail & Related papers (2022-10-05T14:41:38Z) - StyleFaceV: Face Video Generation via Decomposing and Recomposing
Pretrained StyleGAN3 [43.43545400625567]
We propose a principled framework named StyleFaceV, which produces high-fidelity identity-preserving face videos with vivid movements.
Our core insight is to decompose appearance and pose information and recompose them in the latent space of StyleGAN3 to produce stable and dynamic results.
arXiv Detail & Related papers (2022-08-16T17:47:03Z) - StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via
Pretrained StyleGAN [49.917296433657484]
One-shot talking face generation aims at synthesizing a high-quality talking face video from an arbitrary portrait image.
In this work, we investigate the latent feature space of a pre-trained StyleGAN and discover some excellent spatial transformation properties.
We propose a novel unified framework based on a pre-trained StyleGAN that enables a set of powerful functionalities.
arXiv Detail & Related papers (2022-03-08T12:06:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.