Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis
- URL: http://arxiv.org/abs/2401.08503v3
- Date: Sat, 23 Mar 2024 06:40:22 GMT
- Title: Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis
- Authors: Zhenhui Ye, Tianyun Zhong, Yi Ren, Jiaqi Yang, Weichuang Li, Jiawei Huang, Ziyue Jiang, Jinzheng He, Rongjie Huang, Jinglin Liu, Chen Zhang, Xiang Yin, Zejun Ma, Zhou Zhao,
- Abstract summary: One-shot 3D talking portrait generation aims to reconstruct a 3D avatar from an unseen image, and then animate it with a reference video or audio.
We present Real3D-Potrait, a framework that improves the one-shot 3D reconstruction power with a large image-to-plane model.
Experiments show that Real3D-Portrait generalizes well to unseen identities and generates more realistic talking portrait videos.
- Score: 88.17520303867099
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One-shot 3D talking portrait generation aims to reconstruct a 3D avatar from an unseen image, and then animate it with a reference video or audio to generate a talking portrait video. The existing methods fail to simultaneously achieve the goals of accurate 3D avatar reconstruction and stable talking face animation. Besides, while the existing works mainly focus on synthesizing the head part, it is also vital to generate natural torso and background segments to obtain a realistic talking portrait video. To address these limitations, we present Real3D-Potrait, a framework that (1) improves the one-shot 3D reconstruction power with a large image-to-plane model that distills 3D prior knowledge from a 3D face generative model; (2) facilitates accurate motion-conditioned animation with an efficient motion adapter; (3) synthesizes realistic video with natural torso movement and switchable background using a head-torso-background super-resolution model; and (4) supports one-shot audio-driven talking face generation with a generalizable audio-to-motion model. Extensive experiments show that Real3D-Portrait generalizes well to unseen identities and generates more realistic talking portrait videos compared to previous methods. Video samples and source code are available at https://real3dportrait.github.io .
Related papers
- NeRFFaceSpeech: One-shot Audio-driven 3D Talking Head Synthesis via Generative Prior [5.819784482811377]
We propose a novel method, NeRFFaceSpeech, which enables to produce high-quality 3D-aware talking head.
Our method can craft a 3D-consistent facial feature space corresponding to a single image.
We also introduce LipaintNet that can replenish the lacking information in the inner-mouth area.
arXiv Detail & Related papers (2024-05-09T13:14:06Z) - 3D-Aware Talking-Head Video Motion Transfer [20.135083791297603]
We propose a 3D-aware talking-head video motion transfer network, Head3D.
Head3D exploits the subject appearance information by generating a visually-interpretable 3D canonical head from the 2D subject frames.
Our experiments on two public talking-head video datasets demonstrate that Head3D outperforms both 2D and 3D prior arts.
arXiv Detail & Related papers (2023-11-05T02:50:45Z) - DF-3DFace: One-to-Many Speech Synchronized 3D Face Animation with
Diffusion [68.85904927374165]
We propose DF-3DFace, a diffusion-driven speech-to-3D face mesh synthesis.
It captures the complex one-to-many relationships between speech and 3D face based on diffusion.
It simultaneously achieves more realistic facial animation than the state-of-the-art methods.
arXiv Detail & Related papers (2023-08-23T04:14:55Z) - Audio-Driven 3D Facial Animation from In-the-Wild Videos [16.76533748243908]
Given an arbitrary audio clip, audio-driven 3D facial animation aims to generate lifelike lip motions and facial expressions for a 3D head.
Existing methods typically rely on training their models using limited public 3D datasets that contain a restricted number of audio-3D scan pairs.
We propose a novel method that leverages in-the-wild 2D talking-head videos to train our 3D facial animation model.
arXiv Detail & Related papers (2023-06-20T13:53:05Z) - PV3D: A 3D Generative Model for Portrait Video Generation [94.96025739097922]
We propose PV3D, the first generative framework that can synthesize multi-view consistent portrait videos.
PV3D is able to support many downstream applications such as animating static portraits and view-consistent video motion editing.
arXiv Detail & Related papers (2022-12-13T05:42:44Z) - SadTalker: Learning Realistic 3D Motion Coefficients for Stylized
Audio-Driven Single Image Talking Face Animation [33.651156455111916]
We present SadTalker, which generates 3D motion coefficients (head pose, expression) of the 3DMM from audio.
Precisely, we present ExpNet to learn the accurate facial expression from audio by distilling both coefficients and 3D-rendered faces.
arXiv Detail & Related papers (2022-11-22T11:35:07Z) - AniFaceGAN: Animatable 3D-Aware Face Image Generation for Video Avatars [71.00322191446203]
2D generative models often suffer from undesirable artifacts when rendering images from different camera viewpoints.
Recently, 3D-aware GANs extend 2D GANs for explicit disentanglement of camera pose by leveraging 3D scene representations.
We propose an animatable 3D-aware GAN for multiview consistent face animation generation.
arXiv Detail & Related papers (2022-10-12T17:59:56Z) - 3D-TalkEmo: Learning to Synthesize 3D Emotional Talking Head [13.305263646852087]
We introduce 3D-TalkEmo, a deep neural network that generates 3D talking head animation with various emotions.
We also create a large 3D dataset with synchronized audios and videos, rich corpus, as well as various emotion states of different persons.
arXiv Detail & Related papers (2021-04-25T02:48:19Z) - Audio- and Gaze-driven Facial Animation of Codec Avatars [149.0094713268313]
We describe the first approach to animate Codec Avatars in real-time using audio and/or eye tracking.
Our goal is to display expressive conversations between individuals that exhibit important social signals.
arXiv Detail & Related papers (2020-08-11T22:28:48Z) - Audio-driven Talking Face Video Generation with Learning-based
Personalized Head Pose [67.31838207805573]
We propose a deep neural network model that takes an audio signal A of a source person and a short video V of a target person as input.
We outputs a synthesized high-quality talking face video with personalized head pose.
Our method can generate high-quality talking face videos with more distinguishing head movement effects than state-of-the-art methods.
arXiv Detail & Related papers (2020-02-24T10:02:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.