GAT-NeRF: Geometry-Aware-Transformer Enhanced Neural Radiance Fields for High-Fidelity 4D Facial Avatars
- URL: http://arxiv.org/abs/2601.14875v1
- Date: Wed, 21 Jan 2026 11:05:13 GMT
- Title: GAT-NeRF: Geometry-Aware-Transformer Enhanced Neural Radiance Fields for High-Fidelity 4D Facial Avatars
- Authors: Zhe Chang, Haodong Jin, Ying Sun, Yan Song, Hui Yu,
- Abstract summary: We present Geometry-Aware-Transformer Enhanced NeRF (GAT-NeRF) for high-fidelity and controllable 4D facial avatar reconstruction.<n>GAT-NeRF integrates the Transformer mechanism into the Neural Radiance Fields (NeRF) pipeline.<n>Experiments unequivocally demonstrate GAT-NeRF's state-of-the-art performance in visual fidelity and high-frequency detail recovery.
- Score: 11.047907356679746
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: High-fidelity 4D dynamic facial avatar reconstruction from monocular video is a critical yet challenging task, driven by increasing demands for immersive virtual human applications. While Neural Radiance Fields (NeRF) have advanced scene representation, their capacity to capture high-frequency facial details, such as dynamic wrinkles and subtle textures from information-constrained monocular streams, requires significant enhancement. To tackle this challenge, we propose a novel hybrid neural radiance field framework, called Geometry-Aware-Transformer Enhanced NeRF (GAT-NeRF) for high-fidelity and controllable 4D facial avatar reconstruction, which integrates the Transformer mechanism into the NeRF pipeline. GAT-NeRF synergistically combines a coordinate-aligned Multilayer Perceptron (MLP) with a lightweight Transformer module, termed as Geometry-Aware-Transformer (GAT) due to its processing of multi-modal inputs containing explicit geometric priors. The GAT module is enabled by fusing multi-modal input features, including 3D spatial coordinates, 3D Morphable Model (3DMM) expression parameters, and learnable latent codes to effectively learn and enhance feature representations pertinent to fine-grained geometry. The Transformer's effective feature learning capabilities are leveraged to significantly augment the modeling of complex local facial patterns like dynamic wrinkles and acne scars. Comprehensive experiments unequivocally demonstrate GAT-NeRF's state-of-the-art performance in visual fidelity and high-frequency detail recovery, forging new pathways for creating realistic dynamic digital humans for multimedia applications.
Related papers
- Multispectral-NeRF:a multispectral modeling approach based on neural radiance fields [3.606065291262699]
3D reconstruction techniques based on 2D images typically rely on RGB spectral information.<n>Additional spectral bands beyond RGB have been increasingly incorporated into 3D reconstruction.<n>Existing methods that integrate these expanded spectral data often suffer from expensive scheme prices, low accuracy and poor geometric features.<n>We propose Multispectral-NeRF, an enhanced neural architecture derived from NeRF that can effectively integrate multispectral information.
arXiv Detail & Related papers (2025-09-14T09:04:35Z) - DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos [52.46386528202226]
We introduce the Deformable Gaussian Splats Large Reconstruction Model (DGS-LRM)<n>It is the first feed-forward method predicting deformable 3D Gaussian splats from a monocular posed video of any dynamic scene.<n>It achieves performance on par with state-of-the-art monocular video 3D tracking methods.
arXiv Detail & Related papers (2025-06-11T17:59:58Z) - TeGA: Texture Space Gaussian Avatars for High-Resolution Dynamic Head Modeling [52.87836237427514]
Photoreal avatars are seen as a key component in emerging applications in telepresence, extended reality, and entertainment.<n>We present a new high-detail 3D head avatar model that improves upon the state of the art.
arXiv Detail & Related papers (2025-05-08T22:10:27Z) - DeforHMR: Vision Transformer with Deformable Cross-Attention for 3D Human Mesh Recovery [2.1653492349540784]
DeforHMR is a novel regression-based monocular HMR framework designed to enhance the prediction of human pose parameters.
DeforHMR leverages a novel query-agnostic deformable cross-attention mechanism within the transformer decoder.
It achieves state-of-the-art performance for single-frame regression-based methods on the widely used 3D HMR benchmarks 3DPW and RICH.
arXiv Detail & Related papers (2024-11-18T00:46:59Z) - Learning Personalized High Quality Volumetric Head Avatars from
Monocular RGB Videos [47.94545609011594]
We propose a method to learn a high-quality implicit 3D head avatar from a monocular RGB video captured in the wild.
Our hybrid pipeline combines the geometry prior and dynamic tracking of a 3DMM with a neural radiance field to achieve fine-grained control and photorealism.
arXiv Detail & Related papers (2023-04-04T01:10:04Z) - NeRFMeshing: Distilling Neural Radiance Fields into
Geometrically-Accurate 3D Meshes [56.31855837632735]
We propose a compact and flexible architecture that enables easy 3D surface reconstruction from any NeRF-driven approach.
Our final 3D mesh is physically accurate and can be rendered in real time on an array of devices.
arXiv Detail & Related papers (2023-03-16T16:06:03Z) - NeRFInvertor: High Fidelity NeRF-GAN Inversion for Single-shot Real
Image Animation [66.0838349951456]
Nerf-based Generative models have shown impressive capacity in generating high-quality images with consistent 3D geometry.
We propose a universal method to surgically fine-tune these NeRF-GAN models in order to achieve high-fidelity animation of real subjects only by a single image.
arXiv Detail & Related papers (2022-11-30T18:36:45Z) - Next3D: Generative Neural Texture Rasterization for 3D-Aware Head
Avatars [36.4402388864691]
3D-aware generative adversarial networks (GANs) synthesize high-fidelity and multi-view-consistent facial images using only collections of single-view 2D imagery.
Recent efforts incorporate 3D Morphable Face Model (3DMM) to describe deformation in generative radiance fields either explicitly or implicitly.
We propose a novel 3D GAN framework for unsupervised learning of generative, high-quality and 3D-consistent facial avatars from unstructured 2D images.
arXiv Detail & Related papers (2022-11-21T06:40:46Z) - HVTR: Hybrid Volumetric-Textural Rendering for Human Avatars [65.82222842213577]
We propose a novel neural rendering pipeline, which synthesizes virtual human avatars from arbitrary poses efficiently and at high quality.
First, we learn to encode articulated human motions on a dense UV manifold of the human body surface.
We then leverage the encoded information on the UV manifold to construct a 3D volumetric representation.
arXiv Detail & Related papers (2021-12-19T17:34:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.