StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via
Pretrained StyleGAN
- URL: http://arxiv.org/abs/2203.04036v1
- Date: Tue, 8 Mar 2022 12:06:12 GMT
- Title: StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via
Pretrained StyleGAN
- Authors: Fei Yin and Yong Zhang and Xiaodong Cun and Mingdeng Cao and Yanbo Fan
and Xuan Wang and Qingyan Bai and Baoyuan Wu and Jue Wang and Yujiu Yang
- Abstract summary: One-shot talking face generation aims at synthesizing a high-quality talking face video from an arbitrary portrait image.
In this work, we investigate the latent feature space of a pre-trained StyleGAN and discover some excellent spatial transformation properties.
We propose a novel unified framework based on a pre-trained StyleGAN that enables a set of powerful functionalities.
- Score: 49.917296433657484
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One-shot talking face generation aims at synthesizing a high-quality talking
face video from an arbitrary portrait image, driven by a video or an audio
segment. One challenging quality factor is the resolution of the output video:
higher resolution conveys more details. In this work, we investigate the latent
feature space of a pre-trained StyleGAN and discover some excellent spatial
transformation properties. Upon the observation, we explore the possibility of
using a pre-trained StyleGAN to break through the resolution limit of training
datasets. We propose a novel unified framework based on a pre-trained StyleGAN
that enables a set of powerful functionalities, i.e., high-resolution video
generation, disentangled control by driving video or audio, and flexible face
editing. Our framework elevates the resolution of the synthesized talking face
to 1024*1024 for the first time, even though the training dataset has a lower
resolution. We design a video-based motion generation module and an audio-based
one, which can be plugged into the framework either individually or jointly to
drive the video generation. The predicted motion is used to transform the
latent features of StyleGAN for visual animation. To compensate for the
transformation distortion, we propose a calibration network as well as a domain
loss to refine the features. Moreover, our framework allows two types of facial
editing, i.e., global editing via GAN inversion and intuitive editing based on
3D morphable models. Comprehensive experiments show superior video quality,
flexible controllability, and editability over state-of-the-art methods.
Related papers
- Controllable Talking Face Generation by Implicit Facial Keypoints Editing [6.036277153327655]
We present ControlTalk, a talking face generation method to control face expression deformation based on driven audio.
Our experiments show that our method is superior to state-of-the-art performance on widely used benchmarks, including HDTF and MEAD.
arXiv Detail & Related papers (2024-06-05T02:54:46Z) - I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models [18.36472998650704]
We introduce a novel and generic solution that extends the applicability of image editing tools to videos by propagating edits from a single frame to the entire video using a pre-trained image-to-video model.
Our method, dubbed I2VEdit, adaptively preserves the visual and motion integrity of the source video depending on the extent of the edits.
arXiv Detail & Related papers (2024-05-26T11:47:40Z) - GenDeF: Learning Generative Deformation Field for Video Generation [89.49567113452396]
We propose to render a video by warping one static image with a generative deformation field (GenDeF)
Such a pipeline enjoys three appealing advantages.
arXiv Detail & Related papers (2023-12-07T18:59:41Z) - MagicStick: Controllable Video Editing via Control Handle
Transformations [109.26314726025097]
MagicStick is a controllable video editing method that edits the video properties by utilizing the transformation on the extracted internal control signals.
We present experiments on numerous examples within our unified framework.
We also compare with shape-aware text-based editing and handcrafted motion video generation, demonstrating our superior temporal consistency and editing capability than previous works.
arXiv Detail & Related papers (2023-12-05T17:58:06Z) - High-Fidelity and Freely Controllable Talking Head Video Generation [31.08828907637289]
We propose a novel model that produces high-fidelity talking head videos with free control over head pose and expression.
We introduce a novel motion-aware multi-scale feature alignment module to effectively transfer the motion without face distortion.
We evaluate our model on challenging datasets and demonstrate its state-of-the-art performance.
arXiv Detail & Related papers (2023-04-20T09:02:41Z) - Video-P2P: Video Editing with Cross-attention Control [68.64804243427756]
Video-P2P is a novel framework for real-world video editing with cross-attention control.
Video-P2P works well on real-world videos for generating new characters while optimally preserving their original poses and scenes.
arXiv Detail & Related papers (2023-03-08T17:53:49Z) - VToonify: Controllable High-Resolution Portrait Video Style Transfer [103.54337984566877]
We introduce a novel VToonify framework for controllable high-resolution portrait video style transfer.
We leverage the mid- and high-resolution layers of StyleGAN to render artistic portraits based on the multi-scale content features extracted by an encoder.
Our framework is compatible with existing StyleGAN-based image toonification models to extend them to video toonification, and inherits appealing features of these models for flexible style control on color and intensity.
arXiv Detail & Related papers (2022-09-22T17:59:10Z) - PIE: Portrait Image Embedding for Semantic Control [82.69061225574774]
We present the first approach for embedding real portrait images in the latent space of StyleGAN.
We use StyleRig, a pretrained neural network that maps the control space of a 3D morphable face model to the latent space of the GAN.
An identity energy preservation term allows spatially coherent edits while maintaining facial integrity.
arXiv Detail & Related papers (2020-09-20T17:53:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.