DPE: Disentanglement of Pose and Expression for General Video Portrait
Editing
- URL: http://arxiv.org/abs/2301.06281v1
- Date: Mon, 16 Jan 2023 06:39:51 GMT
- Title: DPE: Disentanglement of Pose and Expression for General Video Portrait
Editing
- Authors: Youxin Pang, Yong Zhang, Weize Quan, Yanbo Fan, Xiaodong Cun, Ying
Shan, Dong-ming Yan
- Abstract summary: One-shot video-driven talking face generation aims at producing a synthetic talking video by transferring the facial motion from a video to an arbitrary portrait image.
In this paper, we introduce a novel self-supervised disentanglement framework to decouple pose and expression without 3DMMs and paired data.
- Score: 30.1002454931945
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One-shot video-driven talking face generation aims at producing a synthetic
talking video by transferring the facial motion from a video to an arbitrary
portrait image. Head pose and facial expression are always entangled in facial
motion and transferred simultaneously. However, the entanglement sets up a
barrier for these methods to be used in video portrait editing directly, where
it may require to modify the expression only while maintaining the pose
unchanged. One challenge of decoupling pose and expression is the lack of
paired data, such as the same pose but different expressions. Only a few
methods attempt to tackle this challenge with the feat of 3D Morphable Models
(3DMMs) for explicit disentanglement. But 3DMMs are not accurate enough to
capture facial details due to the limited number of Blenshapes, which has side
effects on motion transfer. In this paper, we introduce a novel self-supervised
disentanglement framework to decouple pose and expression without 3DMMs and
paired data, which consists of a motion editing module, a pose generator, and
an expression generator. The editing module projects faces into a latent space
where pose motion and expression motion can be disentangled, and the pose or
expression transfer can be performed in the latent space conveniently via
addition. The two generators render the modified latent codes to images,
respectively. Moreover, to guarantee the disentanglement, we propose a
bidirectional cyclic training strategy with well-designed constraints.
Evaluations demonstrate our method can control pose or expression independently
and be used for general video editing.
Related papers
- MVInpainter: Learning Multi-View Consistent Inpainting to Bridge 2D and 3D Editing [90.30646271720919]
Novel View Synthesis (NVS) and 3D generation have recently achieved prominent improvements.
We propose MVInpainter, re-formulating the 3D editing as a multi-view 2D inpainting task.
MVInpainter partially inpaints multi-view images with the reference guidance rather than intractably generating an entirely novel view from scratch.
arXiv Detail & Related papers (2024-08-15T07:57:28Z) - Editing 3D Scenes via Text Prompts without Retraining [80.57814031701744]
DN2N is a text-driven editing method that allows for the direct acquisition of a NeRF model with universal editing capabilities.
Our method employs off-the-shelf text-based editing models of 2D images to modify the 3D scene images.
Our method achieves multiple editing types, including but not limited to appearance editing, weather transition, material changing, and style transfer.
arXiv Detail & Related papers (2023-09-10T02:31:50Z) - High-Fidelity and Freely Controllable Talking Head Video Generation [31.08828907637289]
We propose a novel model that produces high-fidelity talking head videos with free control over head pose and expression.
We introduce a novel motion-aware multi-scale feature alignment module to effectively transfer the motion without face distortion.
We evaluate our model on challenging datasets and demonstrate its state-of-the-art performance.
arXiv Detail & Related papers (2023-04-20T09:02:41Z) - POCE: Pose-Controllable Expression Editing [75.7701103792032]
This paper presents POCE, an innovative pose-controllable expression editing network.
It can generate realistic facial expressions and head poses simultaneously with just unpaired training images.
The learned model can generate realistic and high-fidelity facial expressions under various new poses.
arXiv Detail & Related papers (2023-04-18T12:26:19Z) - Continuously Controllable Facial Expression Editing in Talking Face
Videos [34.83353695337335]
Speech-related expressions and emotion-related expressions are often highly coupled.
Traditional image-to-image translation methods cannot work well in our application.
We propose a high-quality facial expression editing method for talking face videos.
arXiv Detail & Related papers (2022-09-17T09:05:47Z) - Explicitly Controllable 3D-Aware Portrait Generation [42.30481422714532]
We propose a 3D portrait generation network that produces consistent portraits according to semantic parameters regarding pose, identity, expression and lighting.
Our method outperforms prior arts in extensive experiments, producing realistic portraits with vivid expression in natural lighting when viewed in free viewpoint.
arXiv Detail & Related papers (2022-09-12T17:40:08Z) - Video2StyleGAN: Encoding Video in Latent Space for Manipulation [63.03250800510085]
We propose a novel network to encode face videos into the latent space of StyleGAN for semantic face video manipulation.
Our approach can significantly outperform existing single image methods, while achieving real-time (66 fps) speed.
arXiv Detail & Related papers (2022-06-27T06:48:15Z) - 3D GAN Inversion for Controllable Portrait Image Animation [45.55581298551192]
We leverage newly developed 3D GANs, which allow explicit control over the pose of the image subject with multi-view consistency.
The proposed technique for portrait image animation outperforms previous methods in terms of image quality, identity preservation, and pose transfer.
arXiv Detail & Related papers (2022-03-25T04:06:06Z) - PIRenderer: Controllable Portrait Image Generation via Semantic Neural
Rendering [56.762094966235566]
A Portrait Image Neural Renderer is proposed to control the face motions with the parameters of three-dimensional morphable face models.
The proposed model can generate photo-realistic portrait images with accurate movements according to intuitive modifications.
Our model can generate coherent videos with convincing movements from only a single reference image and a driving audio stream.
arXiv Detail & Related papers (2021-09-17T07:24:16Z) - Pixel Sampling for Style Preserving Face Pose Editing [53.14006941396712]
We present a novel two-stage approach to solve the dilemma, where the task of face pose manipulation is cast into face inpainting.
By selectively sampling pixels from the input face and slightly adjust their relative locations, the face editing result faithfully keeps the identity information as well as the image style unchanged.
With the 3D facial landmarks as guidance, our method is able to manipulate face pose in three degrees of freedom, i.e., yaw, pitch, and roll, resulting in more flexible face pose editing.
arXiv Detail & Related papers (2021-06-14T11:29:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.