OPT: One-shot Pose-Controllable Talking Head Generation
- URL: http://arxiv.org/abs/2302.08197v1
- Date: Thu, 16 Feb 2023 10:26:52 GMT
- Title: OPT: One-shot Pose-Controllable Talking Head Generation
- Authors: Jin Liu, Xi Wang, Xiaomeng Fu, Yesheng Chai, Cai Yu, Jiao Dai, Jizhong
Han
- Abstract summary: One-shot talking head generation produces lip-sync talking heads based on arbitrary audio and one source face.
We present One-shot Pose-controllable Talking head generation network (OPT)
OPT generates high-quality pose-controllable talking heads with no identity mismatch problem, outperforming previous SOTA methods.
- Score: 14.205344055665414
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One-shot talking head generation produces lip-sync talking heads based on
arbitrary audio and one source face. To guarantee the naturalness and realness,
recent methods propose to achieve free pose control instead of simply editing
mouth areas. However, existing methods do not preserve accurate identity of
source face when generating head motions. To solve the identity mismatch
problem and achieve high-quality free pose control, we present One-shot
Pose-controllable Talking head generation network (OPT). Specifically, the
Audio Feature Disentanglement Module separates content features from audios,
eliminating the influence of speaker-specific information contained in
arbitrary driving audios. Later, the mouth expression feature is extracted from
the content feature and source face, during which the landmark loss is designed
to enhance the accuracy of facial structure and identity preserving quality.
Finally, to achieve free pose control, controllable head pose features from
reference videos are fed into the Video Generator along with the expression
feature and source face to generate new talking heads. Extensive quantitative
and qualitative experimental results verify that OPT generates high-quality
pose-controllable talking heads with no identity mismatch problem,
outperforming previous SOTA methods.
Related papers
- PoseTalk: Text-and-Audio-based Pose Control and Motion Refinement for One-Shot Talking Head Generation [17.158581488104186]
Previous audio-driven talking head generation (THG) methods generate head poses from driving audio.
We propose textbfPoseTalk, a THG system that can freely generate lip-synchronized talking head videos with free head poses conditioned on text prompts and audio.
arXiv Detail & Related papers (2024-09-04T12:30:25Z) - Controllable Talking Face Generation by Implicit Facial Keypoints Editing [6.036277153327655]
We present ControlTalk, a talking face generation method to control face expression deformation based on driven audio.
Our experiments show that our method is superior to state-of-the-art performance on widely used benchmarks, including HDTF and MEAD.
arXiv Detail & Related papers (2024-06-05T02:54:46Z) - MFR-Net: Multi-faceted Responsive Listening Head Generation via
Denoising Diffusion Model [14.220727407255966]
Responsive listening head generation is an important task that aims to model face-to-face communication scenarios.
We propose the textbfMulti-textbfFaceted textbfResponsive Listening Head Generation Network (MFR-Net)
arXiv Detail & Related papers (2023-08-31T11:10:28Z) - Identity-Preserving Talking Face Generation with Landmark and Appearance
Priors [106.79923577700345]
Existing person-generic methods have difficulty in generating realistic and lip-synced videos.
We propose a two-stage framework consisting of audio-to-landmark generation and landmark-to-video rendering procedures.
Our method can produce more realistic, lip-synced, and identity-preserving videos than existing person-generic talking face generation methods.
arXiv Detail & Related papers (2023-05-15T01:31:32Z) - FONT: Flow-guided One-shot Talking Head Generation with Natural Head
Motions [14.205344055665414]
Flow-guided One-shot model achieves NaTural head motions over generated talking heads.
Head pose prediction module is designed to generate head pose sequences from the source face and driving audio.
arXiv Detail & Related papers (2023-03-31T03:25:06Z) - GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face
Synthesis [62.297513028116576]
GeneFace is a general and high-fidelity NeRF-based talking face generation method.
A head-aware torso-NeRF is proposed to eliminate the head-torso problem.
arXiv Detail & Related papers (2023-01-31T05:56:06Z) - Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in
Transformers [91.00397473678088]
Previous studies have explored generating accurately lip-synced talking faces for arbitrary targets given audio conditions.
We propose the Audio-Visual Context-Aware Transformer (AV-CAT) framework, which produces accurate lip-sync with photo-realistic quality.
Our model can generate high-fidelity lip-synced results for arbitrary subjects.
arXiv Detail & Related papers (2022-12-09T16:32:46Z) - DFA-NeRF: Personalized Talking Head Generation via Disentangled Face
Attributes Neural Rendering [69.9557427451339]
We propose a framework based on neural radiance field to pursue high-fidelity talking head generation.
Specifically, neural radiance field takes lip movements features and personalized attributes as two disentangled conditions.
We show that our method achieves significantly better results than state-of-the-art methods.
arXiv Detail & Related papers (2022-01-03T18:23:38Z) - Pose-Controllable Talking Face Generation by Implicitly Modularized
Audio-Visual Representation [96.66010515343106]
We propose a clean yet effective framework to generate pose-controllable talking faces.
We operate on raw face images, using only a single photo as an identity reference.
Our model has multiple advanced capabilities including extreme view robustness and talking face frontalization.
arXiv Detail & Related papers (2021-04-22T15:10:26Z) - HeadGAN: One-shot Neural Head Synthesis and Editing [70.30831163311296]
HeadGAN is a system that synthesises on 3D face representations and adapted to the facial geometry of any reference image.
The 3D face representation enables HeadGAN to be further used as an efficient method for compression and reconstruction and a tool for expression and pose editing.
arXiv Detail & Related papers (2020-12-15T12:51:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.