Pix2NPHM: Learning to Regress NPHM Reconstructions From a Single Image
- URL: http://arxiv.org/abs/2512.17773v1
- Date: Fri, 19 Dec 2025 16:44:32 GMT
- Title: Pix2NPHM: Learning to Regress NPHM Reconstructions From a Single Image
- Authors: Simon Giebenhain, Tobias Kirschstein, Liam Schoneveld, Davide Davoli, Zhe Chen, Matthias Nießner,
- Abstract summary: We propose Pix2NPHM, a vision transformer that regresses NPHM parameters, given a single image as input.<n>Compared to existing approaches, the neural space allows our method to reconstruct more recognizable facial geometry.<n>We achieve unprecedented face reconstruction quality that can run at scale on in-the-wild data.
- Score: 48.80951099813609
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Neural Parametric Head Models (NPHMs) are a recent advancement over mesh-based 3d morphable models (3DMMs) to facilitate high-fidelity geometric detail. However, fitting NPHMs to visual inputs is notoriously challenging due to the expressive nature of their underlying latent space. To this end, we propose Pix2NPHM, a vision transformer (ViT) network that directly regresses NPHM parameters, given a single image as input. Compared to existing approaches, the neural parametric space allows our method to reconstruct more recognizable facial geometry and accurate facial expressions. For broad generalization, we exploit domain-specific ViTs as backbones, which are pretrained on geometric prediction tasks. We train Pix2NPHM on a mixture of 3D data, including a total of over 100K NPHM registrations that enable direct supervision in SDF space, and large-scale 2D video datasets, for which normal estimates serve as pseudo ground truth geometry. Pix2NPHM not only allows for 3D reconstructions at interactive frame rates, it is also possible to improve geometric fidelity by a subsequent inference-time optimization against estimated surface normals and canonical point maps. As a result, we achieve unprecedented face reconstruction quality that can run at scale on in-the-wild data.
Related papers
- WorldMirror: Universal 3D World Reconstruction with Any-Prior Prompting [51.69408870574092]
We present WorldMirror, an all-in-one, feed-forward model for versatile 3D geometric prediction tasks.<n>Our framework flexibly integrates diverse geometric priors, including camera poses, intrinsics, and depth maps.<n>WorldMirror achieves state-of-the-art performance across diverse benchmarks from camera, point map, depth, and surface normal estimation to novel view synthesis.
arXiv Detail & Related papers (2025-10-12T17:59:09Z) - Pixel3DMM: Versatile Screen-Space Priors for Single-Image 3D Face Reconstruction [46.52887358194364]
We propose Pixel3DMM, a set of highly-generalized vision transformers which predict per-pixel geometric cues.<n>We train our model by registering three high-quality 3D face datasets against the FLAME mesh topology.<n>Our method outperforms the most competitive baselines by over 15% in terms of geometric accuracy for posed facial expressions.
arXiv Detail & Related papers (2025-05-01T15:47:03Z) - DiMeR: Disentangled Mesh Reconstruction Model [29.827345186012558]
DiMeR is a novel geometry-texture disentangled feed-forward model with 3D supervision for sparse-view mesh reconstruction.<n>We streamline the algorithm of mesh extraction by eliminating modules with low performance/cost ratios and redesigning regularization losses with 3D supervision.<n>Extensive experiments demonstrate that DiMeR generalises across sparse-view-, single-image-, and text-to-3D tasks, consistently outperforming baselines.
arXiv Detail & Related papers (2025-04-24T15:39:20Z) - DNPM: A Neural Parametric Model for the Synthesis of Facial Geometric Details [8.849223786633083]
In 3D face modeling, 3DMM is the most widely used parametric model, but can't generate fine geometric details solely from identity and expression inputs.
We propose a neural parametric model named DNPM for the facial geometric details, which utilizes deep neural network to extract latent codes from facial displacement maps encoding details and wrinkles.
We show that DNPM and Detailed3DMM can facilitate two downstream applications: speech-driven detailed 3D facial animation and 3D face reconstruction from a degraded image.
arXiv Detail & Related papers (2024-05-30T04:57:55Z) - Flatten Anything: Unsupervised Neural Surface Parameterization [76.4422287292541]
We introduce the Flatten Anything Model (FAM), an unsupervised neural architecture to achieve global free-boundary surface parameterization.
Compared with previous methods, our FAM directly operates on discrete surface points without utilizing connectivity information.
Our FAM is fully-automated without the need for pre-cutting and can deal with highly-complex topologies.
arXiv Detail & Related papers (2024-05-23T14:39:52Z) - MonoNPHM: Dynamic Head Reconstruction from Monocular Videos [47.504979561265536]
We present Monocular Neural Parametric Head Models (MonoNPHM) for dynamic 3D head reconstructions from monocular RGB videos.
We constrain predicted color values to be correlated with the underlying geometry such that gradients from RGB effectively influence latent geometry codes during inverse rendering.
arXiv Detail & Related papers (2023-12-11T17:55:05Z) - DiViNeT: 3D Reconstruction from Disparate Views via Neural Template
Regularization [7.488962492863031]
We present a volume rendering-based neural surface reconstruction method that takes as few as three disparate RGB images as input.
Our key idea is to regularize the reconstruction, which is severely ill-posed and leaving significant gaps between the sparse views.
Our approach achieves the best reconstruction quality among existing methods in the presence of such sparse views.
arXiv Detail & Related papers (2023-06-07T18:05:14Z) - LiP-Flow: Learning Inference-time Priors for Codec Avatars via
Normalizing Flows in Latent Space [90.74976459491303]
We introduce a prior model that is conditioned on the runtime inputs and tie this prior space to the 3D face model via a normalizing flow in the latent space.
A normalizing flow bridges the two representation spaces and transforms latent samples from one domain to another, allowing us to define a latent likelihood objective.
We show that our approach leads to an expressive and effective prior, capturing facial dynamics and subtle expressions better.
arXiv Detail & Related papers (2022-03-15T13:22:57Z) - H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction [27.66008315400462]
Recent learning approaches that implicitly represent surface geometry have shown impressive results in the problem of multi-view 3D reconstruction.
We tackle these limitations for the specific problem of few-shot full 3D head reconstruction.
We learn a shape model of 3D heads from thousands of incomplete raw scans using implicit representations.
arXiv Detail & Related papers (2021-07-26T23:04:18Z) - Learning Deformable Tetrahedral Meshes for 3D Reconstruction [78.0514377738632]
3D shape representations that accommodate learning-based 3D reconstruction are an open problem in machine learning and computer graphics.
Previous work on neural 3D reconstruction demonstrated benefits, but also limitations, of point cloud, voxel, surface mesh, and implicit function representations.
We introduce Deformable Tetrahedral Meshes (DefTet) as a particular parameterization that utilizes volumetric tetrahedral meshes for the reconstruction problem.
arXiv Detail & Related papers (2020-11-03T02:57:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.