Controllable One-Shot Face Video Synthesis With Semantic Aware Prior
- URL: http://arxiv.org/abs/2304.14471v1
- Date: Thu, 27 Apr 2023 19:17:13 GMT
- Title: Controllable One-Shot Face Video Synthesis With Semantic Aware Prior
- Authors: Kangning Liu, Yu-Chuan Su, Wei (Alex) Hong, Ruijin Cang, Xuhui Jia
- Abstract summary: One-shot talking-head synthesis task aims to animate a source image to another pose and expression, which is dictated by a driving frame.
Recent methods rely on warping the appearance feature extracted from the source, by using motion fields estimated from the sparse keypoints, that are learned in an unsupervised manner.
We propose a novel method that leverages the rich face prior information.
- Score: 10.968343822308812
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The one-shot talking-head synthesis task aims to animate a source image to
another pose and expression, which is dictated by a driving frame. Recent
methods rely on warping the appearance feature extracted from the source, by
using motion fields estimated from the sparse keypoints, that are learned in an
unsupervised manner. Due to their lightweight formulation, they are suitable
for video conferencing with reduced bandwidth. However, based on our study,
current methods suffer from two major limitations: 1) unsatisfactory generation
quality in the case of large head poses and the existence of observable pose
misalignment between the source and the first frame in driving videos. 2) fail
to capture fine yet critical face motion details due to the lack of semantic
understanding and appropriate face geometry regularization. To address these
shortcomings, we propose a novel method that leverages the rich face prior
information, the proposed model can generate face videos with improved semantic
consistency (improve baseline by $7\%$ in average keypoint distance) and
expression-preserving (outperform baseline by $15 \%$ in average emotion
embedding distance) under equivalent bandwidth. Additionally, incorporating
such prior information provides us with a convenient interface to achieve
highly controllable generation in terms of both pose and expression.
Related papers
- High-fidelity and Lip-synced Talking Face Synthesis via Landmark-based Diffusion Model [89.29655924125461]
We propose a novel landmark-based diffusion model for talking face generation.
We first establish the less ambiguous mapping from audio to landmark motion of lip and jaw.
Then, we introduce an innovative conditioning module called TalkFormer to align the synthesized motion with the motion represented by landmarks.
arXiv Detail & Related papers (2024-08-10T02:58:28Z) - Controllable Talking Face Generation by Implicit Facial Keypoints Editing [6.036277153327655]
We present ControlTalk, a talking face generation method to control face expression deformation based on driven audio.
Our experiments show that our method is superior to state-of-the-art performance on widely used benchmarks, including HDTF and MEAD.
arXiv Detail & Related papers (2024-06-05T02:54:46Z) - High-Fidelity and Freely Controllable Talking Head Video Generation [31.08828907637289]
We propose a novel model that produces high-fidelity talking head videos with free control over head pose and expression.
We introduce a novel motion-aware multi-scale feature alignment module to effectively transfer the motion without face distortion.
We evaluate our model on challenging datasets and demonstrate its state-of-the-art performance.
arXiv Detail & Related papers (2023-04-20T09:02:41Z) - Video2StyleGAN: Encoding Video in Latent Space for Manipulation [63.03250800510085]
We propose a novel network to encode face videos into the latent space of StyleGAN for semantic face video manipulation.
Our approach can significantly outperform existing single image methods, while achieving real-time (66 fps) speed.
arXiv Detail & Related papers (2022-06-27T06:48:15Z) - Correcting Face Distortion in Wide-Angle Videos [85.88898349347149]
We present a video warping algorithm to correct these distortions.
Our key idea is to apply stereographic projection locally on the facial regions.
For performance evaluation, we develop a wide-angle video dataset with a wide range of focal lengths.
arXiv Detail & Related papers (2021-11-18T21:28:17Z) - JOKR: Joint Keypoint Representation for Unsupervised Cross-Domain Motion
Retargeting [53.28477676794658]
unsupervised motion in videos has seen substantial advancements through the use of deep neural networks.
We introduce JOKR - a JOint Keypoint Representation that handles both the source and target videos, without requiring any object prior or data collection.
We evaluate our method both qualitatively and quantitatively, and demonstrate that our method handles various cross-domain scenarios, such as different animals, different flowers, and humans.
arXiv Detail & Related papers (2021-06-17T17:32:32Z) - Real-time Pose and Shape Reconstruction of Two Interacting Hands With a
Single Depth Camera [79.41374930171469]
We present a novel method for real-time pose and shape reconstruction of two strongly interacting hands.
Our approach combines an extensive list of favorable properties, namely it is marker-less.
We show state-of-the-art results in scenes that exceed the complexity level demonstrated by previous work.
arXiv Detail & Related papers (2021-06-15T11:39:49Z) - Pose-Controllable Talking Face Generation by Implicitly Modularized
Audio-Visual Representation [96.66010515343106]
We propose a clean yet effective framework to generate pose-controllable talking faces.
We operate on raw face images, using only a single photo as an identity reference.
Our model has multiple advanced capabilities including extreme view robustness and talking face frontalization.
arXiv Detail & Related papers (2021-04-22T15:10:26Z) - Deep Dual Consecutive Network for Human Pose Estimation [44.41818683253614]
We propose a novel multi-frame human pose estimation framework, leveraging abundant temporal cues between video frames to facilitate keypoint detection.
Our method ranks No.1 in the Multi-frame Person Pose Challenge Challenge on the large-scale benchmark datasets PoseTrack 2017 and PoseTrack 2018.
arXiv Detail & Related papers (2021-03-12T13:11:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.