Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking
Portrait Synthesis
- URL: http://arxiv.org/abs/2307.09323v2
- Date: Thu, 24 Aug 2023 11:25:43 GMT
- Title: Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking
Portrait Synthesis
- Authors: Jiahe Li, Jiawei Zhang, Xiao Bai, Jun Zhou, Lin Gu
- Abstract summary: ER-NeRF is a conditional Neural Radiance Fields (NeRF) based architecture for talking portrait.
Our method renders better high-fidelity and audio-lips, with realistic details and high efficiency compared to previous methods.
- Score: 20.111316792226482
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents ER-NeRF, a novel conditional Neural Radiance Fields
(NeRF) based architecture for talking portrait synthesis that can concurrently
achieve fast convergence, real-time rendering, and state-of-the-art performance
with small model size. Our idea is to explicitly exploit the unequal
contribution of spatial regions to guide talking portrait modeling.
Specifically, to improve the accuracy of dynamic head reconstruction, a compact
and expressive NeRF-based Tri-Plane Hash Representation is introduced by
pruning empty spatial regions with three planar hash encoders. For speech
audio, we propose a Region Attention Module to generate region-aware condition
feature via an attention mechanism. Different from existing methods that
utilize an MLP-based encoder to learn the cross-modal relation implicitly, the
attention mechanism builds an explicit connection between audio features and
spatial regions to capture the priors of local motions. Moreover, a direct and
fast Adaptive Pose Encoding is introduced to optimize the head-torso separation
problem by mapping the complex transformation of the head pose into spatial
coordinates. Extensive experiments demonstrate that our method renders better
high-fidelity and audio-lips synchronized talking portrait videos, with
realistic details and high efficiency compared to previous methods.
Related papers
- Hi-Map: Hierarchical Factorized Radiance Field for High-Fidelity
Monocular Dense Mapping [51.739466714312805]
We introduce Hi-Map, a novel monocular dense mapping approach based on Neural Radiance Field (NeRF)
Hi-Map is exceptional in its capacity to achieve efficient and high-fidelity mapping using only posed RGB inputs.
arXiv Detail & Related papers (2024-01-06T12:32:25Z) - PNeRFLoc: Visual Localization with Point-based Neural Radiance Fields [54.8553158441296]
We propose a novel visual localization framework, ie, PNeRFLoc, based on a unified point-based representation.
On the one hand, PNeRFLoc supports the initial pose estimation by matching 2D and 3D feature points.
On the other hand, it also enables pose refinement with novel view synthesis using rendering-based optimization.
arXiv Detail & Related papers (2023-12-17T08:30:00Z) - FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models [85.16273912625022]
We introduce FaceTalk, a novel generative approach designed for synthesizing high-fidelity 3D motion sequences of talking human heads from audio signal.
To the best of our knowledge, this is the first work to propose a generative approach for realistic and high-quality motion synthesis of human heads.
arXiv Detail & Related papers (2023-12-13T19:01:07Z) - VoxNeRF: Bridging Voxel Representation and Neural Radiance Fields for
Enhanced Indoor View Synthesis [51.49008959209671]
We introduce VoxNeRF, a novel approach that leverages volumetric representations to enhance the quality and efficiency of indoor view synthesis.
We employ multi-resolution hash grids to adaptively capture spatial features, effectively managing occlusions and the intricate geometry of indoor scenes.
We validate our approach against three public indoor datasets and demonstrate that VoxNeRF outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-11-09T11:32:49Z) - DT-NeRF: Decomposed Triplane-Hash Neural Radiance Fields for
High-Fidelity Talking Portrait Synthesis [15.674126345649913]
We present the triplane-hash neural radiance fields (DT-NeRF) framework for photorealistic rendering of talking faces.
Our architecture decomposes the facial region into two specialized triplanes: one specialized for representing the mouth, and the other for the broader facial features.
arXiv Detail & Related papers (2023-09-14T14:39:05Z) - Grid-guided Neural Radiance Fields for Large Urban Scenes [146.06368329445857]
Recent approaches propose to geographically divide the scene and adopt multiple sub-NeRFs to model each region individually.
An alternative solution is to use a feature grid representation, which is computationally efficient and can naturally scale to a large scene.
We present a new framework that realizes high-fidelity rendering on large urban scenes while being computationally efficient.
arXiv Detail & Related papers (2023-03-24T13:56:45Z) - Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial
Decomposition [61.6677901687009]
We propose an efficient NeRF-based framework that enables real-time synthesizing of talking portraits.
Our method can generate realistic and audio-lips synchronized talking portrait videos.
arXiv Detail & Related papers (2022-11-22T16:03:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.