DSFNet: Dual Space Fusion Network for Occlusion-Robust 3D Dense Face
Alignment
- URL: http://arxiv.org/abs/2305.11522v1
- Date: Fri, 19 May 2023 08:43:37 GMT
- Title: DSFNet: Dual Space Fusion Network for Occlusion-Robust 3D Dense Face
Alignment
- Authors: Heyuan Li, Bo Wang, Yu Cheng, Mohan Kankanhalli, Robby T. Tan
- Abstract summary: The state-of-the-art 3DMM-based method, directly regresses the model's coefficients.
We propose a fusion network that combines the advantages of both the image and model space predictions.
- Score: 34.223372986832544
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sensitivity to severe occlusion and large view angles limits the usage
scenarios of the existing monocular 3D dense face alignment methods. The
state-of-the-art 3DMM-based method, directly regresses the model's
coefficients, underutilizing the low-level 2D spatial and semantic information,
which can actually offer cues for face shape and orientation. In this work, we
demonstrate how modeling 3D facial geometry in image and model space jointly
can solve the occlusion and view angle problems. Instead of predicting the
whole face directly, we regress image space features in the visible facial
region by dense prediction first. Subsequently, we predict our model's
coefficients based on the regressed feature of the visible regions, leveraging
the prior knowledge of whole face geometry from the morphable models to
complete the invisible regions. We further propose a fusion network that
combines the advantages of both the image and model space predictions to
achieve high robustness and accuracy in unconstrained scenarios. Thanks to the
proposed fusion module, our method is robust not only to occlusion and large
pitch and roll view angles, which is the benefit of our image space approach,
but also to noise and large yaw angles, which is the benefit of our model space
method. Comprehensive evaluations demonstrate the superior performance of our
method compared with the state-of-the-art methods. On the 3D dense face
alignment task, we achieve 3.80% NME on the AFLW2000-3D dataset, which
outperforms the state-of-the-art method by 5.5%. Code is available at
https://github.com/lhyfst/DSFNet.
Related papers
- FAMOUS: High-Fidelity Monocular 3D Human Digitization Using View Synthesis [51.193297565630886]
The challenge of accurately inferring texture remains, particularly in obscured areas such as the back of a person in frontal-view images.
This limitation in texture prediction largely stems from the scarcity of large-scale and diverse 3D datasets.
We propose leveraging extensive 2D fashion datasets to enhance both texture and shape prediction in 3D human digitization.
arXiv Detail & Related papers (2024-10-13T01:25:05Z) - STAF: 3D Human Mesh Recovery from Video with Spatio-Temporal Alignment
Fusion [35.42718669331158]
Existing models usually ignore spatial and temporal information, which might lead to mesh and image misalignment and temporal discontinuity.
As a video-based model, it leverages coherence clues from human motion by an attention-based Temporal Coherence Fusion Module.
In addition, we propose an Average Pooling Module (APM) to allow the model to focus on the entire input sequence rather than just the target frame.
arXiv Detail & Related papers (2024-01-03T13:07:14Z) - FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models [67.96827539201071]
We propose a novel test-time optimization approach for 3D scene reconstruction.
Our method achieves state-of-the-art cross-dataset reconstruction on five zero-shot testing datasets.
arXiv Detail & Related papers (2023-08-10T17:55:02Z) - RAFaRe: Learning Robust and Accurate Non-parametric 3D Face
Reconstruction from Pseudo 2D&3D Pairs [13.11105614044699]
We propose a robust and accurate non-parametric method for single-view 3D face reconstruction (SVFR)
A large-scale pseudo 2D&3D dataset is created by first rendering the detailed 3D faces, then swapping the face in the wild images with the rendered face.
Our model outperforms previous methods on FaceScape-wild/lab and MICC benchmarks.
arXiv Detail & Related papers (2023-02-10T19:40:26Z) - Neural Capture of Animatable 3D Human from Monocular Video [38.974181971541846]
We present a novel paradigm of building an animatable 3D human representation from a monocular video input, such that it can be rendered in any unseen poses and views.
Our method is based on a dynamic Neural Radiance Field (NeRF) rigged by a mesh-based parametric 3D human model serving as a geometry proxy.
arXiv Detail & Related papers (2022-08-18T09:20:48Z) - RiCS: A 2D Self-Occlusion Map for Harmonizing Volumetric Objects [68.85305626324694]
Ray-marching in Camera Space (RiCS) is a new method to represent the self-occlusions of foreground objects in 3D into a 2D self-occlusion map.
We show that our representation map not only allows us to enhance the image quality but also to model temporally coherent complex shadow effects.
arXiv Detail & Related papers (2022-05-14T05:35:35Z) - FaceVerse: a Fine-grained and Detail-controllable 3D Face Morphable
Model from a Hybrid Dataset [36.688730105295015]
FaceVerse is built from hybrid East Asian face datasets containing 60K fused RGB-D images and 2K high-fidelity 3D head scan models.
In the coarse module, we generate a base parametric model from large-scale RGB-D images, which is able to predict accurate rough 3D face models in different genders, ages, etc.
In the fine module, a conditional StyleGAN architecture trained with high-fidelity scan models is introduced to enrich elaborate facial geometric and texture details.
arXiv Detail & Related papers (2022-03-26T12:13:14Z) - AvatarMe++: Facial Shape and BRDF Inference with Photorealistic
Rendering-Aware GANs [119.23922747230193]
We introduce the first method that is able to reconstruct render-ready 3D facial geometry and BRDF from a single "in-the-wild" image.
Our method outperforms the existing arts by a significant margin and reconstructs high-resolution 3D faces from a single low-resolution image.
arXiv Detail & Related papers (2021-12-11T11:36:30Z) - Implicit Neural Deformation for Multi-View Face Reconstruction [43.88676778013593]
We present a new method for 3D face reconstruction from multi-view RGB images.
Unlike previous methods which are built upon 3D morphable models, our method leverages an implicit representation to encode rich geometric features.
Our experimental results on several benchmark datasets demonstrate that our approach outperforms alternative baselines and achieves superior face reconstruction results compared to state-of-the-art methods.
arXiv Detail & Related papers (2021-12-05T07:02:53Z) - Cylinder3D: An Effective 3D Framework for Driving-scene LiDAR Semantic
Segmentation [87.54570024320354]
State-of-the-art methods for large-scale driving-scene LiDAR semantic segmentation often project and process the point clouds in the 2D space.
A straightforward solution to tackle the issue of 3D-to-2D projection is to keep the 3D representation and process the points in the 3D space.
We develop a 3D cylinder partition and a 3D cylinder convolution based framework, termed as Cylinder3D, which exploits the 3D topology relations and structures of driving-scene point clouds.
arXiv Detail & Related papers (2020-08-04T13:56:19Z) - Learning 3D Human Shape and Pose from Dense Body Parts [117.46290013548533]
We propose a Decompose-and-aggregate Network (DaNet) to learn 3D human shape and pose from dense correspondences of body parts.
Messages from local streams are aggregated to enhance the robust prediction of the rotation-based poses.
Our method is validated on both indoor and real-world datasets including Human3.6M, UP3D, COCO, and 3DPW.
arXiv Detail & Related papers (2019-12-31T15:09:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.