GeoDiff4D: Geometry-Aware Diffusion for 4D Head Avatar Reconstruction
- URL: http://arxiv.org/abs/2602.24161v1
- Date: Fri, 27 Feb 2026 16:41:21 GMT
- Title: GeoDiff4D: Geometry-Aware Diffusion for 4D Head Avatar Reconstruction
- Authors: Chao Xu, Xiaochen Zhao, Xiang Deng, Jingxiang Sun, Zhuo Su, Donglin Di, Yebin Liu,
- Abstract summary: We propose a novel framework that leverages geometry-aware diffusion to learn strong geometry priors for high-fidelity head avatar reconstruction.<n>Our approach jointly synthesizes portrait images and corresponding surface normals, while a pose-free expression captures implicit expression representations.<n>Our method substantially outperforms state-of-the-art approaches in visual quality, expression fidelity, and cross-identity generalization.
- Score: 49.70452913749897
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reconstructing photorealistic and animatable 4D head avatars from a single portrait image remains a fundamental challenge in computer vision. While diffusion models have enabled remarkable progress in image and video generation for avatar reconstruction, existing methods primarily rely on 2D priors and struggle to achieve consistent 3D geometry. We propose a novel framework that leverages geometry-aware diffusion to learn strong geometry priors for high-fidelity head avatar reconstruction. Our approach jointly synthesizes portrait images and corresponding surface normals, while a pose-free expression encoder captures implicit expression representations. Both synthesized images and expression latents are incorporated into 3D Gaussian-based avatars, enabling photorealistic rendering with accurate geometry. Extensive experiments demonstrate that our method substantially outperforms state-of-the-art approaches in visual quality, expression fidelity, and cross-identity generalization, while supporting real-time rendering.
Related papers
- Self-Evolving 3D Scene Generation from a Single Image [44.87957263540352]
EvoScene is a training-free framework that progressively reconstructs complete 3D scenes from single images.<n>EvoScene alternates between 2D and 3D domains, gradually improving both structure and appearance.
arXiv Detail & Related papers (2025-12-09T18:44:21Z) - MoGA: 3D Generative Avatar Prior for Monocular Gaussian Avatar Reconstruction [65.5412504339528]
MoGA is a novel method to reconstruct high-fidelity 3D Gaussian avatars from a single-view image.<n>Our method surpasses state-of-the-art techniques and generalizes well to real-world scenarios.
arXiv Detail & Related papers (2025-07-31T14:36:24Z) - HAvatar: High-fidelity Head Avatar via Facial Model Conditioned Neural
Radiance Field [44.848368616444446]
We introduce a novel hybrid explicit-implicit 3D representation, Facial Model Conditioned Neural Radiance Field, which integrates the expressiveness of NeRF and the prior information from the parametric template.
By adopting an overall GAN-based architecture using an image-to-image translation network, we achieve high-resolution, realistic and view-consistent synthesis of dynamic head appearance.
arXiv Detail & Related papers (2023-09-29T10:45:22Z) - Generalizable One-shot Neural Head Avatar [90.50492165284724]
We present a method that reconstructs and animates a 3D head avatar from a single-view portrait image.
We propose a framework that not only generalizes to unseen identities based on a single-view image, but also captures characteristic details within and beyond the face area.
arXiv Detail & Related papers (2023-06-14T22:33:09Z) - Single-Shot Implicit Morphable Faces with Consistent Texture
Parameterization [91.52882218901627]
We propose a novel method for constructing implicit 3D morphable face models that are both generalizable and intuitive for editing.
Our method improves upon photo-realism, geometry, and expression accuracy compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-05-04T17:58:40Z) - SIRA: Relightable Avatars from a Single Image [19.69326772087838]
We introduce SIRA, a method which reconstructs human head avatars with high fidelity geometry and factorized lights and surface materials.
Our key ingredients are two data-driven statistical models based on neural fields that resolve the ambiguities of single-view 3D surface reconstruction and appearance factorization.
arXiv Detail & Related papers (2022-09-07T09:47:46Z) - DRaCoN -- Differentiable Rasterization Conditioned Neural Radiance
Fields for Articulated Avatars [92.37436369781692]
We present DRaCoN, a framework for learning full-body volumetric avatars.
It exploits the advantages of both the 2D and 3D neural rendering techniques.
Experiments on the challenging ZJU-MoCap and Human3.6M datasets indicate that DRaCoN outperforms state-of-the-art methods.
arXiv Detail & Related papers (2022-03-29T17:59:15Z) - Deep 3D Portrait from a Single Image [54.634207317528364]
We present a learning-based approach for recovering the 3D geometry of human head from a single portrait image.
A two-step geometry learning scheme is proposed to learn 3D head reconstruction from in-the-wild face images.
We evaluate the accuracy of our method both in 3D and with pose manipulation tasks on 2D images.
arXiv Detail & Related papers (2020-04-24T08:55:37Z) - AvatarMe: Realistically Renderable 3D Facial Reconstruction
"in-the-wild" [105.28776215113352]
AvatarMe is the first method that is able to reconstruct photorealistic 3D faces from a single "in-the-wild" image with an increasing level of detail.
It outperforms the existing arts by a significant margin and reconstructs authentic, 4K by 6K-resolution 3D faces from a single low-resolution image.
arXiv Detail & Related papers (2020-03-30T22:17:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.