Detailed Facial Geometry Recovery from Multi-view Images by Learning an
Implicit Function
- URL: http://arxiv.org/abs/2201.01016v1
- Date: Tue, 4 Jan 2022 07:24:58 GMT
- Title: Detailed Facial Geometry Recovery from Multi-view Images by Learning an
Implicit Function
- Authors: Yunze Xiao, Hao Zhu, Haotian Yang, Zhengyu Diao, Xiangju Lu, Xun Cao
- Abstract summary: We propose a novel architecture to recover extremely detailed 3D faces in roughly 10 seconds.
By fitting a 3D morphable model from multi-view images, the features of multiple images are extracted and aggregated in the mesh-attached UV space.
Our method outperforms SOTA learning-based MVS in accuracy by a large margin on the FaceScape dataset.
- Score: 12.522283941978722
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recovering detailed facial geometry from a set of calibrated multi-view
images is valuable for its wide range of applications. Traditional multi-view
stereo (MVS) methods adopt optimization methods to regularize the matching
cost. Recently, learning-based methods integrate all these into an end-to-end
neural network and show superiority of efficiency. In this paper, we propose a
novel architecture to recover extremely detailed 3D faces in roughly 10
seconds. Unlike previous learning-based methods that regularize the cost volume
via 3D CNN, we propose to learn an implicit function for regressing the
matching cost. By fitting a 3D morphable model from multi-view images, the
features of multiple images are extracted and aggregated in the mesh-attached
UV space, which makes the implicit function more effective in recovering
detailed facial shape. Our method outperforms SOTA learning-based MVS in
accuracy by a large margin on the FaceScape dataset. The code and data will be
released soon.
Related papers
- Large Spatial Model: End-to-end Unposed Images to Semantic 3D [79.94479633598102]
Large Spatial Model (LSM) processes unposed RGB images directly into semantic radiance fields.
LSM simultaneously estimates geometry, appearance, and semantics in a single feed-forward operation.
It can generate versatile label maps by interacting with language at novel viewpoints.
arXiv Detail & Related papers (2024-10-24T17:54:42Z) - 3D Facial Expressions through Analysis-by-Neural-Synthesis [30.2749903946587]
SMIRK (Spatial Modeling for Image-based Reconstruction of Kinesics) faithfully reconstructs expressive 3D faces from images.
We identify two key limitations in existing methods: shortcomings in their self-supervised training formulation, and a lack of expression diversity in the training images.
Our qualitative, quantitative and particularly our perceptual evaluations demonstrate that SMIRK achieves the new state-of-the art performance on accurate expression reconstruction.
arXiv Detail & Related papers (2024-04-05T14:00:07Z) - Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object
Structure via HyperNetworks [53.67497327319569]
We introduce a novel neural rendering technique to solve image-to-3D from a single view.
Our approach employs the signed distance function as the surface representation and incorporates generalizable priors through geometry-encoding volumes and HyperNetworks.
Our experiments show the advantages of our proposed approach with consistent results and rapid generation.
arXiv Detail & Related papers (2023-12-24T08:42:37Z) - HQ3DAvatar: High Quality Controllable 3D Head Avatar [65.70885416855782]
This paper presents a novel approach to building highly photorealistic digital head avatars.
Our method learns a canonical space via an implicit function parameterized by a neural network.
At test time, our method is driven by a monocular RGB video.
arXiv Detail & Related papers (2023-03-25T13:56:33Z) - Learning Enriched Features for Fast Image Restoration and Enhancement [166.17296369600774]
This paper presents a holistic goal of maintaining spatially-precise high-resolution representations through the entire network.
We learn an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
Our approach achieves state-of-the-art results for a variety of image processing tasks, including defocus deblurring, image denoising, super-resolution, and image enhancement.
arXiv Detail & Related papers (2022-04-19T17:59:45Z) - Facial Geometric Detail Recovery via Implicit Representation [147.07961322377685]
We present a robust texture-guided geometric detail recovery approach using only a single in-the-wild facial image.
Our method combines high-quality texture completion with the powerful expressiveness of implicit surfaces.
Our method not only recovers accurate facial details but also decomposes normals, albedos, and shading parts in a self-supervised way.
arXiv Detail & Related papers (2022-03-18T01:42:59Z) - Curvature-guided dynamic scale networks for Multi-view Stereo [10.667165962654996]
This paper focuses on learning a robust feature extraction network to enhance the performance of matching costs without heavy computation.
We present a dynamic scale feature extraction network, namely, CDSFNet.
It is composed of multiple novel convolution layers, each of which can select a proper patch scale for each pixel guided by the normal curvature of the image surface.
arXiv Detail & Related papers (2021-12-11T14:41:05Z) - Implicit Neural Deformation for Multi-View Face Reconstruction [43.88676778013593]
We present a new method for 3D face reconstruction from multi-view RGB images.
Unlike previous methods which are built upon 3D morphable models, our method leverages an implicit representation to encode rich geometric features.
Our experimental results on several benchmark datasets demonstrate that our approach outperforms alternative baselines and achieves superior face reconstruction results compared to state-of-the-art methods.
arXiv Detail & Related papers (2021-12-05T07:02:53Z) - DeepMultiCap: Performance Capture of Multiple Characters Using Sparse
Multiview Cameras [63.186486240525554]
DeepMultiCap is a novel method for multi-person performance capture using sparse multi-view cameras.
Our method can capture time varying surface details without the need of using pre-scanned template models.
arXiv Detail & Related papers (2021-05-01T14:32:13Z) - MeshMVS: Multi-View Stereo Guided Mesh Reconstruction [35.763452474239955]
Deep learning based 3D shape generation methods generally utilize latent features extracted from color images to encode the semantics of objects.
We propose a multi-view mesh generation method which incorporates geometry information explicitly by using the features from intermediate depth representations of multi-view stereo.
We achieve superior results than state-of-the-art multi-view shape generation methods with 34% decrease in Chamfer distance to ground truth and 14% increase in F1-score on ShapeNet dataset.
arXiv Detail & Related papers (2020-10-17T00:51:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.