Related papers: Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis

Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis

URL: http://arxiv.org/abs/2401.08503v3
Date: Sat, 23 Mar 2024 06:40:22 GMT
Title: Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis
Authors: Zhenhui Ye, Tianyun Zhong, Yi Ren, Jiaqi Yang, Weichuang Li, Jiawei Huang, Ziyue Jiang, Jinzheng He, Rongjie Huang, Jinglin Liu, Chen Zhang, Xiang Yin, Zejun Ma, Zhou Zhao,
Abstract summary: One-shot 3D talking portrait generation aims to reconstruct a 3D avatar from an unseen image, and then animate it with a reference video or audio. We present Real3D-Potrait, a framework that improves the one-shot 3D reconstruction power with a large image-to-plane model. Experiments show that Real3D-Portrait generalizes well to unseen identities and generates more realistic talking portrait videos.
Score: 88.17520303867099
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: One-shot 3D talking portrait generation aims to reconstruct a 3D avatar from an unseen image, and then animate it with a reference video or audio to generate a talking portrait video. The existing methods fail to simultaneously achieve the goals of accurate 3D avatar reconstruction and stable talking face animation. Besides, while the existing works mainly focus on synthesizing the head part, it is also vital to generate natural torso and background segments to obtain a realistic talking portrait video. To address these limitations, we present Real3D-Potrait, a framework that (1) improves the one-shot 3D reconstruction power with a large image-to-plane model that distills 3D prior knowledge from a 3D face generative model; (2) facilitates accurate motion-conditioned animation with an efficient motion adapter; (3) synthesizes realistic video with natural torso movement and switchable background using a head-torso-background super-resolution model; and (4) supports one-shot audio-driven talking face generation with a generalizable audio-to-motion model. Extensive experiments show that Real3D-Portrait generalizes well to unseen identities and generates more realistic talking portrait videos compared to previous methods. Video samples and source code are available at https://real3dportrait.github.io .

Related papers

From Blurry to Believable: Enhancing Low-quality Talking Heads with 3D Generative Priors [49.37666175170832]
We introduce SuperHead, a framework for enhancing low-resolution, animatable 3D head avatars.<n>SuperHead synthesizes high-quality geometry and textures, while ensuring both 3D and temporal consistency.<n>Experiments demonstrate that SuperHead generates avatars with fine-grained facial details under dynamic motions.
arXiv Detail & Related papers (2026-02-05T19:00:50Z)
Splat-Portrait: Generalizing Talking Heads with Gaussian Splatting [6.62155043692653]
Talking Head Generation aims at synthesizing natural-looking talking videos from speech and a single portrait image.<n>Previous 3D talking head generation methods have relied on domain-specifics such as warping-based facial motion representation priors to animate talking motions.<n>We introduce Splat-Portrait, a Gaussian-splatting-based method that addresses the challenges of 3D head reconstruction and lip motion synthesis.
arXiv Detail & Related papers (2026-01-26T16:06:57Z)
VASA-3D: Lifelike Audio-Driven Gaussian Head Avatars from a Single Image [27.76629170122787]
VASA-3D is an audio-driven, single-shot 3D head avatar generator.<n>This research tackles two major challenges: capturing the subtle expression details present in real human faces, and reconstructing an intricate 3D head avatar from a single portrait image.
arXiv Detail & Related papers (2025-12-16T18:44:00Z)
Gaussians-to-Life: Text-Driven Animation of 3D Gaussian Splatting Scenes [49.26872036160368]
We propose a method for animating parts of high-quality 3D scenes in a Gaussian Splatting representation. We find that, in contrast to prior work, this enables realistic animations of complex, pre-existing 3D scenes.
arXiv Detail & Related papers (2024-11-28T16:01:58Z)
NeRFFaceSpeech: One-shot Audio-driven 3D Talking Head Synthesis via Generative Prior [5.819784482811377]
We propose a novel method, NeRFFaceSpeech, which enables to produce high-quality 3D-aware talking head. Our method can craft a 3D-consistent facial feature space corresponding to a single image. We also introduce LipaintNet that can replenish the lacking information in the inner-mouth area.
arXiv Detail & Related papers (2024-05-09T13:14:06Z)
3D-Aware Talking-Head Video Motion Transfer [20.135083791297603]
We propose a 3D-aware talking-head video motion transfer network, Head3D. Head3D exploits the subject appearance information by generating a visually-interpretable 3D canonical head from the 2D subject frames. Our experiments on two public talking-head video datasets demonstrate that Head3D outperforms both 2D and 3D prior arts.
arXiv Detail & Related papers (2023-11-05T02:50:45Z)
DF-3DFace: One-to-Many Speech Synchronized 3D Face Animation with Diffusion [68.85904927374165]
We propose DF-3DFace, a diffusion-driven speech-to-3D face mesh synthesis. It captures the complex one-to-many relationships between speech and 3D face based on diffusion. It simultaneously achieves more realistic facial animation than the state-of-the-art methods.
arXiv Detail & Related papers (2023-08-23T04:14:55Z)
Audio-Driven 3D Facial Animation from In-the-Wild Videos [16.76533748243908]
Given an arbitrary audio clip, audio-driven 3D facial animation aims to generate lifelike lip motions and facial expressions for a 3D head. Existing methods typically rely on training their models using limited public 3D datasets that contain a restricted number of audio-3D scan pairs. We propose a novel method that leverages in-the-wild 2D talking-head videos to train our 3D facial animation model.
arXiv Detail & Related papers (2023-06-20T13:53:05Z)
PV3D: A 3D Generative Model for Portrait Video Generation [94.96025739097922]
We propose PV3D, the first generative framework that can synthesize multi-view consistent portrait videos. PV3D is able to support many downstream applications such as animating static portraits and view-consistent video motion editing.
arXiv Detail & Related papers (2022-12-13T05:42:44Z)
SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation [33.651156455111916]
We present SadTalker, which generates 3D motion coefficients (head pose, expression) of the 3DMM from audio. Precisely, we present ExpNet to learn the accurate facial expression from audio by distilling both coefficients and 3D-rendered faces.
arXiv Detail & Related papers (2022-11-22T11:35:07Z)
AniFaceGAN: Animatable 3D-Aware Face Image Generation for Video Avatars [71.00322191446203]
2D generative models often suffer from undesirable artifacts when rendering images from different camera viewpoints. Recently, 3D-aware GANs extend 2D GANs for explicit disentanglement of camera pose by leveraging 3D scene representations. We propose an animatable 3D-aware GAN for multiview consistent face animation generation.
arXiv Detail & Related papers (2022-10-12T17:59:56Z)
3D-TalkEmo: Learning to Synthesize 3D Emotional Talking Head [13.305263646852087]
We introduce 3D-TalkEmo, a deep neural network that generates 3D talking head animation with various emotions. We also create a large 3D dataset with synchronized audios and videos, rich corpus, as well as various emotion states of different persons.
arXiv Detail & Related papers (2021-04-25T02:48:19Z)
Audio- and Gaze-driven Facial Animation of Codec Avatars [149.0094713268313]
We describe the first approach to animate Codec Avatars in real-time using audio and/or eye tracking. Our goal is to display expressive conversations between individuals that exhibit important social signals.
arXiv Detail & Related papers (2020-08-11T22:28:48Z)
Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose [67.31838207805573]
We propose a deep neural network model that takes an audio signal A of a source person and a short video V of a target person as input. We outputs a synthesized high-quality talking face video with personalized head pose. Our method can generate high-quality talking face videos with more distinguishing head movement effects than state-of-the-art methods.
arXiv Detail & Related papers (2020-02-24T10:02:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.