Related papers: VGGTFace: Topologically Consistent Facial Geometry Reconstruction in the Wild

VGGTFace: Topologically Consistent Facial Geometry Reconstruction in the Wild

URL: http://arxiv.org/abs/2511.20366v2
Date: Wed, 26 Nov 2025 05:30:04 GMT
Title: VGGTFace: Topologically Consistent Facial Geometry Reconstruction in the Wild
Authors: Xin Ming, Yuxuan Han, Tianyu Huang, Feng Xu,
Abstract summary: VGGTFace is an automatic approach that innovatively applies the 3D foundation model, i.e. VGGT, for topologically consistent facial geometry reconstruction.<n>Experiments demonstrate state-of-the-art results on benchmarks and impressive generalization to in-the-wild data.
Score: 19.22685211889589
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reconstructing topologically consistent facial geometry is crucial for the digital avatar creation pipelines. Existing methods either require tedious manual efforts, lack generalization to in-the-wild data, or are constrained by the limited expressiveness of 3D Morphable Models. To address these limitations, we propose VGGTFace, an automatic approach that innovatively applies the 3D foundation model, i.e. VGGT, for topologically consistent facial geometry reconstruction from in-the-wild multi-view images captured by everyday users. Our key insight is that, by leveraging VGGT, our method naturally inherits strong generalization ability and expressive power from its large-scale training and point map representation. However, it is unclear how to reconstruct a topologically consistent mesh from VGGT, as the topology information is missing in its prediction. To this end, we augment VGGT with Pixel3DMM for injecting topology information via pixel-aligned UV values. In this manner, we convert the pixel-aligned point map of VGGT to a point cloud with topology. Tailored to this point cloud with known topology, we propose a novel Topology-Aware Bundle Adjustment strategy to fuse them, where we construct a Laplacian energy for the Bundle Adjustment objective. Our method achieves high-quality reconstruction in 10 seconds for 16 views on a single NVIDIA RTX 4090. Experiments demonstrate state-of-the-art results on benchmarks and impressive generalization to in-the-wild data. Code is available at https://github.com/grignarder/vggtface.

Related papers

GPA-VGGT:Adapting VGGT to Large Scale Localization by Self-Supervised Learning with Geometry and Physics Aware Loss [15.633839321933385]
Recent advancements in Visual Geometry Grounded Transformer (VGGT) models have shown great promise in camera pose estimation and 3D reconstruction.<n>These models typically rely on ground truth labels for training, posing challenges when adapting to unlabeled and unseen scenes.<n>We propose a self-supervised framework to train VGGT with unlabeled data, thereby enhancing its localization capability in large-scale environments.
arXiv Detail & Related papers (2026-01-23T16:46:59Z)
FastVGGT: Training-Free Acceleration of Visual Geometry Transformer [83.67766078575782]
VGGT is a state-of-the-art feed-forward visual geometry model.<n>We propose FastVGGT, which leverages token merging in the 3D domain through a training-free mechanism for accelerating VGGT.<n>With 1000 input images, FastVGGT achieves a 4x speedup over VGGT while mitigating error accumulation in long-sequence scenarios.
arXiv Detail & Related papers (2025-09-02T17:54:21Z)
VGGT: Visual Geometry Grounded Transformer [61.37669770946458]
VGGT is a feed-forward neural network that directly infers all key 3D attributes of a scene.<n>Network achieves state-of-the-art results in multiple 3D tasks.
arXiv Detail & Related papers (2025-03-14T17:59:47Z)
Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries. We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images. Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z)
GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation [65.33726478659304]
We introduce the Geometry-Aware Large Reconstruction Model (GeoLRM), an approach which can predict high-quality assets with 512k Gaussians and 21 input images in only 11 GB GPU memory. Previous works neglect the inherent sparsity of 3D structure and do not utilize explicit geometric relationships between 3D and 2D images. GeoLRM tackles these issues by incorporating a novel 3D-aware transformer structure that directly processes 3D points and uses deformable cross-attention mechanisms.
arXiv Detail & Related papers (2024-06-21T17:49:31Z)
GeoGen: Geometry-Aware Generative Modeling via Signed Distance Functions [22.077366472693395]
We introduce a new generative approach for synthesizing 3D geometry and images from single-view collections. By employing volumetric rendering using neural radiance fields, they inherit a key limitation: the generated geometry is noisy and unconstrained. We propose GeoGen, a new SDF-based 3D generative model trained in an end-to-end manner.
arXiv Detail & Related papers (2024-06-06T17:00:10Z)
Flattening-Net: Deep Regular 2D Representation for 3D Point Cloud Analysis [66.49788145564004]
We present an unsupervised deep neural architecture called Flattening-Net to represent irregular 3D point clouds of arbitrary geometry and topology. Our methods perform favorably against the current state-of-the-art competitors.
arXiv Detail & Related papers (2022-12-17T15:05:25Z)
Learnable Triangulation for Deep Learning-based 3D Reconstruction of Objects of Arbitrary Topology from Single RGB Images [12.693545159861857]
We propose a novel deep reinforcement learning-based approach for 3D object reconstruction from monocular images. The proposed method outperforms the state-of-the-art in terms of visual quality, reconstruction accuracy, and computational time.
arXiv Detail & Related papers (2021-09-24T09:44:22Z)
OSTeC: One-Shot Texture Completion [86.23018402732748]
We propose an unsupervised approach for one-shot 3D facial texture completion. The proposed approach rotates an input image in 3D and fill-in the unseen regions by reconstructing the rotated image in a 2D face generator. We frontalize the target image by projecting the completed texture into the generator.
arXiv Detail & Related papers (2020-12-30T23:53:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.