AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
- URL: http://arxiv.org/abs/2403.17694v1
- Date: Tue, 26 Mar 2024 13:35:02 GMT
- Title: AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
- Authors: Huawei Wei, Zejun Yang, Zhisheng Wang,
- Abstract summary: We propose AniPortrait, a framework for generating high-quality animation driven by audio and a reference portrait image.
Experimental results demonstrate the superiority of AniPortrait in terms of facial naturalness, pose diversity, and visual quality.
Our methodology exhibits considerable potential in terms of flexibility and controllability, which can be effectively applied in areas such as facial motion editing or face reenactment.
- Score: 4.568539181254851
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this study, we propose AniPortrait, a novel framework for generating high-quality animation driven by audio and a reference portrait image. Our methodology is divided into two stages. Initially, we extract 3D intermediate representations from audio and project them into a sequence of 2D facial landmarks. Subsequently, we employ a robust diffusion model, coupled with a motion module, to convert the landmark sequence into photorealistic and temporally consistent portrait animation. Experimental results demonstrate the superiority of AniPortrait in terms of facial naturalness, pose diversity, and visual quality, thereby offering an enhanced perceptual experience. Moreover, our methodology exhibits considerable potential in terms of flexibility and controllability, which can be effectively applied in areas such as facial motion editing or face reenactment. We release code and model weights at https://github.com/scutzzj/AniPortrait
Related papers
- Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation [53.767090490974745]
Follow-Your-Emoji is a diffusion-based framework for portrait animation.
It animates a reference portrait with target landmark sequences.
Our method demonstrates significant performance in controlling the expression of freestyle portraits.
arXiv Detail & Related papers (2024-06-04T02:05:57Z) - EMOPortraits: Emotion-enhanced Multimodal One-shot Head Avatars [36.96390906514729]
MegaPortraits model has demonstrated state-of-the-art results in this domain.
We introduce our EMOPortraits model, where we: Enhance the model's capability to faithfully support intense, asymmetric face expressions.
We propose a novel multi-view video dataset featuring a wide range of intense and asymmetric facial expressions.
arXiv Detail & Related papers (2024-04-29T21:23:29Z) - Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis [88.17520303867099]
One-shot 3D talking portrait generation aims to reconstruct a 3D avatar from an unseen image, and then animate it with a reference video or audio.
We present Real3D-Potrait, a framework that improves the one-shot 3D reconstruction power with a large image-to-plane model.
Experiments show that Real3D-Portrait generalizes well to unseen identities and generates more realistic talking portrait videos.
arXiv Detail & Related papers (2024-01-16T17:04:30Z) - AniPortraitGAN: Animatable 3D Portrait Generation from 2D Image
Collections [78.81539337399391]
We present an animatable 3D-aware GAN that generates portrait images with controllable facial expression, head pose, and shoulder movements.
It is a generative model trained on unstructured 2D image collections without using 3D or video data.
A dual-camera rendering and adversarial learning scheme is proposed to improve the quality of the generated faces.
arXiv Detail & Related papers (2023-09-05T12:44:57Z) - MODA: Mapping-Once Audio-driven Portrait Animation with Dual Attentions [15.626317162430087]
We propose a unified system for multi-person, diverse, and high-fidelity talking portrait generation.
Our method contains three stages, i.e., 1) Mapping-Once network with Dual Attentions (MODA) generates talking representation from given audio.
The proposed system produces more natural and realistic video portraits compared to previous methods.
arXiv Detail & Related papers (2023-07-19T14:45:11Z) - PV3D: A 3D Generative Model for Portrait Video Generation [94.96025739097922]
We propose PV3D, the first generative framework that can synthesize multi-view consistent portrait videos.
PV3D is able to support many downstream applications such as animating static portraits and view-consistent video motion editing.
arXiv Detail & Related papers (2022-12-13T05:42:44Z) - Geometry Driven Progressive Warping for One-Shot Face Animation [5.349852254138086]
Face animation aims at creating photo-realistic portrait videos with animated poses and expressions.
We present a geometry driven model and propose two geometric patterns as guidance: 3D face rendered displacement maps and posed neural codes.
We show that the proposed model can synthesize portrait videos with high fidelity and achieve the new state-of-the-art results on the VoxCeleb1 and VoxCeleb2 datasets.
arXiv Detail & Related papers (2022-10-05T17:07:06Z) - Explicitly Controllable 3D-Aware Portrait Generation [42.30481422714532]
We propose a 3D portrait generation network that produces consistent portraits according to semantic parameters regarding pose, identity, expression and lighting.
Our method outperforms prior arts in extensive experiments, producing realistic portraits with vivid expression in natural lighting when viewed in free viewpoint.
arXiv Detail & Related papers (2022-09-12T17:40:08Z) - PIRenderer: Controllable Portrait Image Generation via Semantic Neural
Rendering [56.762094966235566]
A Portrait Image Neural Renderer is proposed to control the face motions with the parameters of three-dimensional morphable face models.
The proposed model can generate photo-realistic portrait images with accurate movements according to intuitive modifications.
Our model can generate coherent videos with convincing movements from only a single reference image and a driving audio stream.
arXiv Detail & Related papers (2021-09-17T07:24:16Z) - MeshTalk: 3D Face Animation from Speech using Cross-Modality
Disentanglement [142.9900055577252]
We propose a generic audio-driven facial animation approach that achieves highly realistic motion synthesis results for the entire face.
Our approach ensures highly accurate lip motion, while also plausible animation of the parts of the face that are uncorrelated to the audio signal, such as eye blinks and eye brow motion.
arXiv Detail & Related papers (2021-04-16T17:05:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.