GRMM: Real-Time High-Fidelity Gaussian Morphable Head Model with Learned Residuals
- URL: http://arxiv.org/abs/2509.02141v1
- Date: Tue, 02 Sep 2025 09:43:47 GMT
- Title: GRMM: Real-Time High-Fidelity Gaussian Morphable Head Model with Learned Residuals
- Authors: Mohit Mendiratta, Mayur Deshmukh, Kartik Teotia, Vladislav Golyanik, Adam Kortylewski, Christian Theobalt,
- Abstract summary: 3D Morphable Models (3DMMs) enable controllable facial geometry and expression editing for reconstruction, animation, and AR/VR.<n>We introduce GRMM, the first full-head Gaussian 3D morphable model that augments a base 3DMM with residual geometry and appearance components.<n> GRMM surpasses state-of-the-art methods in fidelity and expression accuracy while delivering interactive real-time performance.
- Score: 78.67749748078813
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: 3D Morphable Models (3DMMs) enable controllable facial geometry and expression editing for reconstruction, animation, and AR/VR, but traditional PCA-based mesh models are limited in resolution, detail, and photorealism. Neural volumetric methods improve realism but remain too slow for interactive use. Recent Gaussian Splatting (3DGS) based facial models achieve fast, high-quality rendering but still depend solely on a mesh-based 3DMM prior for expression control, limiting their ability to capture fine-grained geometry, expressions, and full-head coverage. We introduce GRMM, the first full-head Gaussian 3D morphable model that augments a base 3DMM with residual geometry and appearance components, additive refinements that recover high-frequency details such as wrinkles, fine skin texture, and hairline variations. GRMM provides disentangled control through low-dimensional, interpretable parameters (e.g., identity shape, facial expressions) while separately modelling residuals that capture subject- and expression-specific detail beyond the base model's capacity. Coarse decoders produce vertex-level mesh deformations, fine decoders represent per-Gaussian appearance, and a lightweight CNN refines rasterised images for enhanced realism, all while maintaining 75 FPS real-time rendering. To learn consistent, high-fidelity residuals, we present EXPRESS-50, the first dataset with 60 aligned expressions across 50 identities, enabling robust disentanglement of identity and expression in Gaussian-based 3DMMs. Across monocular 3D face reconstruction, novel-view synthesis, and expression transfer, GRMM surpasses state-of-the-art methods in fidelity and expression accuracy while delivering interactive real-time performance.
Related papers
- KaoLRM: Repurposing Pre-trained Large Reconstruction Models for Parametric 3D Face Reconstruction [51.67605823241639]
KaoLRM re-targets the learned prior of the Large Reconstruction Model (LRM) for parametric 3D face reconstruction from single-view images.<n> Experiments on both controlled and in-the-wild benchmarks demonstrate that KaoLRM achieves superior reconstruction accuracy and cross-view consistency.
arXiv Detail & Related papers (2026-01-19T05:36:59Z) - EGG-Fusion: Efficient 3D Reconstruction with Geometry-aware Gaussian Surfel on the Fly [8.803716785929936]
EGG-Fusion is a novel differentiable-rendering-based real-time reconstruction system.<n>The proposed system achieves a surface reconstruction error of 0.6textitcm, representing over 20% improvement in accuracy compared to state-of-the-art methods.<n> Notably, the system maintains real-time processing capabilities at 24 FPS, establishing it as one of the most accurate differentiable-rendering-based real-time reconstruction systems.
arXiv Detail & Related papers (2025-12-01T05:32:17Z) - ARMesh: Autoregressive Mesh Generation via Next-Level-of-Detail Prediction [45.699110709239996]
We propose generating 3D meshes auto-regressively in a progressive coarse-to-fine manner.<n>Specifically, we view mesh simplification algorithms, which gradually merge mesh faces to build simpler meshes.<n>Our experiments show that this novel progressive mesh generation approach provides intuitive control over generation quality and time consumption.
arXiv Detail & Related papers (2025-09-25T07:12:02Z) - EAvatar: Expression-Aware Head Avatar Reconstruction with Generative Geometry Priors [31.25607301318426]
High-fidelity head avatar reconstruction plays a crucial role in AR/VR, gaming, and multimedia content creation.<n>Recent advances in 3D Gaussian Splatting (3DGS) have demonstrated effectiveness in modeling complex geometry with real-time rendering capability.<n>We propose a novel 3DGS-based framework termed EAvatar for head reconstruction that is both expression-aware and deformation-aware.
arXiv Detail & Related papers (2025-08-19T05:56:00Z) - MoGaFace: Momentum-Guided and Texture-Aware Gaussian Avatars for Consistent Facial Geometry [3.0373043721834163]
MoGaFace is a novel 3D head avatar modeling framework that continuously refines facial geometry and texture attributes.<n>MoGaFace achieves high-fidelity head avatar reconstruction and significantly improves novel-view synthesis quality.
arXiv Detail & Related papers (2025-08-02T06:25:51Z) - DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos [52.46386528202226]
We introduce the Deformable Gaussian Splats Large Reconstruction Model (DGS-LRM)<n>It is the first feed-forward method predicting deformable 3D Gaussian splats from a monocular posed video of any dynamic scene.<n>It achieves performance on par with state-of-the-art monocular video 3D tracking methods.
arXiv Detail & Related papers (2025-06-11T17:59:58Z) - MaGS: Reconstructing and Simulating Dynamic 3D Objects with Mesh-adsorbed Gaussian Splatting [27.081250446161114]
This paper introduces the Mesh-adsorbed Gaussian Splatting (MaGS) method to address this challenge.
MaGS constrains 3D Gaussians to roam near the mesh, creating a mutually adsorbed mesh-Gaussian 3D representation.
Such representation harnesses both the rendering flexibility of 3D Gaussians and the structured property of meshes.
arXiv Detail & Related papers (2024-06-03T17:59:51Z) - InstantSplat: Sparse-view Gaussian Splatting in Seconds [91.77050739918037]
We introduce InstantSplat, a novel approach for addressing sparse-view 3D scene reconstruction at lightning-fast speed.<n>InstantSplat employs a self-supervised framework that optimize 3D scene representation and camera poses.<n>It achieves an acceleration of over 30x in reconstruction and improves visual quality (SSIM) from 0.3755 to 0.7624 compared to traditional SfM with 3D-GS.
arXiv Detail & Related papers (2024-03-29T17:29:58Z) - Hybrid Explicit Representation for Ultra-Realistic Head Avatars [55.829497543262214]
We introduce a novel approach to creating ultra-realistic head avatars and rendering them in real-time.<n> UV-mapped 3D mesh is utilized to capture sharp and rich textures on smooth surfaces, while 3D Gaussian Splatting is employed to represent complex geometric structures.<n>Experiments that our modeled results exceed those of state-of-the-art approaches.
arXiv Detail & Related papers (2024-03-18T04:01:26Z) - CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction
Model [37.75256020559125]
We present a high-fidelity feed-forward single image-to-3D generative model.
We highlight the necessity of integrating geometric priors into network design.
Our model delivers a high-fidelity textured mesh from an image in just 10 seconds, without any test-time optimization.
arXiv Detail & Related papers (2024-03-08T04:25:29Z) - Training and Tuning Generative Neural Radiance Fields for Attribute-Conditional 3D-Aware Face Generation [66.21121745446345]
We propose a conditional GNeRF model that integrates specific attribute labels as input, thus amplifying the controllability and disentanglement capabilities of 3D-aware generative models.
Our approach builds upon a pre-trained 3D-aware face model, and we introduce a Training as Init and fidelity for Tuning (TRIOT) method to train a conditional normalized flow module.
Our experiments substantiate the efficacy of our model, showcasing its ability to generate high-quality edits with enhanced view consistency.
arXiv Detail & Related papers (2022-08-26T10:05:39Z) - Free-HeadGAN: Neural Talking Head Synthesis with Explicit Gaze Control [54.079327030892244]
Free-HeadGAN is a person-generic neural talking head synthesis system.
We show that modeling faces with sparse 3D facial landmarks are sufficient for achieving state-of-the-art generative performance.
arXiv Detail & Related papers (2022-08-03T16:46:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.