Depth as Attention for Face Representation Learning
- URL: http://arxiv.org/abs/2101.00652v2
- Date: Mon, 5 Apr 2021 08:56:35 GMT
- Title: Depth as Attention for Face Representation Learning
- Authors: Hardik Uppal, Alireza Sepas-Moghaddam, Michael Greenspan and Ali
Etemad
- Abstract summary: A novel depth-guided attention mechanism is proposed for deep multi-modal face recognition using low-cost RGB-D sensors.
Our solution achieves average (increased) accuracies of 87.3% (+5.0%), 99.1% (+0.9%), 99.7% (+0.6%) and 95.3%(+0.5%) for the four datasets respectively.
- Score: 11.885178256393893
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Face representation learning solutions have recently achieved great success
for various applications such as verification and identification. However, face
recognition approaches that are based purely on RGB images rely solely on
intensity information, and therefore are more sensitive to facial variations,
notably pose, occlusions, and environmental changes such as illumination and
background. A novel depth-guided attention mechanism is proposed for deep
multi-modal face recognition using low-cost RGB-D sensors. Our novel attention
mechanism directs the deep network "where to look" for visual features in the
RGB image by focusing the attention of the network using depth features
extracted by a Convolution Neural Network (CNN). The depth features help the
network focus on regions of the face in the RGB image that contains more
prominent person-specific information. Our attention mechanism then uses this
correlation to generate an attention map for RGB images from the depth features
extracted by CNN. We test our network on four public datasets, showing that the
features obtained by our proposed solution yield better results on the
Lock3DFace, CurtinFaces, IIIT-D RGB-D, and KaspAROV datasets which include
challenging variations in pose, occlusion, illumination, expression, and
time-lapse. Our solution achieves average (increased) accuracies of 87.3\%
(+5.0\%), 99.1\% (+0.9\%), 99.7\% (+0.6\%) and 95.3\%(+0.5\%) for the four
datasets respectively, thereby improving the state-of-the-art. We also perform
additional experiments with thermal images, instead of depth images, showing
the high generalization ability of our solution when adopting other modalities
for guiding the attention mechanism instead of depth information
Related papers
- Depth-based Privileged Information for Boosting 3D Human Pose Estimation on RGB [48.31210455404533]
Heatmap-based 3D pose estimator is able to hallucinate depth information from the RGB frames given at inference time.
depth information is used exclusively during training by enforcing our RGB-based hallucination network to learn similar features to a backbone pre-trained only on depth data.
arXiv Detail & Related papers (2024-09-17T11:59:34Z) - Depth Map Denoising Network and Lightweight Fusion Network for Enhanced
3D Face Recognition [61.27785140017464]
We introduce an innovative Depth map denoising network (DMDNet) based on the Denoising Implicit Image Function (DIIF) to reduce noise.
We further design a powerful recognition network called Lightweight Depth and Normal Fusion network (LDNFNet) to learn unique and complementary features between different modalities.
arXiv Detail & Related papers (2024-01-01T10:46:42Z) - AGG-Net: Attention Guided Gated-convolutional Network for Depth Image
Completion [1.8820731605557168]
We propose a new model for depth image completion based on the Attention Guided Gated-convolutional Network (AGG-Net)
In the encoding stage, an Attention Guided Gated-Convolution (AG-GConv) module is proposed to realize the fusion of depth and color features at different scales.
In the decoding stage, an Attention Guided Skip Connection (AG-SC) module is presented to avoid introducing too many depth-irrelevant features to the reconstruction.
arXiv Detail & Related papers (2023-09-04T14:16:08Z) - Pyramid Deep Fusion Network for Two-Hand Reconstruction from RGB-D Images [11.100398985633754]
We propose an end-to-end framework for recovering dense meshes for both hands.
Our framework employs ResNet50 and PointNet++ to derive features from RGB and point cloud.
We also introduce a novel pyramid deep fusion network (PDFNet) to aggregate features at different scales.
arXiv Detail & Related papers (2023-07-12T09:33:21Z) - Symmetric Uncertainty-Aware Feature Transmission for Depth
Super-Resolution [52.582632746409665]
We propose a novel Symmetric Uncertainty-aware Feature Transmission (SUFT) for color-guided DSR.
Our method achieves superior performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-06-01T06:35:59Z) - Improving 2D face recognition via fine-level facial depth generation and
RGB-D complementary feature learning [0.8223798883838329]
We propose a fine-grained facial depth generation network and an improved multimodal complementary feature learning network.
Experiments on the Lock3DFace dataset and the IIIT-D dataset show that the proposed FFDGNet and I MCFLNet can improve the accuracy of RGB-D face recognition.
arXiv Detail & Related papers (2023-05-08T02:33:59Z) - Physically-Based Face Rendering for NIR-VIS Face Recognition [165.54414962403555]
Near infrared (NIR) to Visible (VIS) face matching is challenging due to the significant domain gaps.
We propose a novel method for paired NIR-VIS facial image generation.
To facilitate the identity feature learning, we propose an IDentity-based Maximum Mean Discrepancy (ID-MMD) loss.
arXiv Detail & Related papers (2022-11-11T18:48:16Z) - High-Accuracy RGB-D Face Recognition via Segmentation-Aware Face Depth
Estimation and Mask-Guided Attention Network [16.50097148165777]
Deep learning approaches have achieved highly accurate face recognition by training the models with very large face image datasets.
Unlike the availability of large 2D face image datasets, there is a lack of large 3D face datasets available to the public.
This paper proposes two CNN models to improve the RGB-D face recognition task.
arXiv Detail & Related papers (2021-12-22T07:46:23Z) - MobileSal: Extremely Efficient RGB-D Salient Object Detection [62.04876251927581]
This paper introduces a novel network, methodname, which focuses on efficient RGB-D salient object detection (SOD)
We propose an implicit depth restoration (IDR) technique to strengthen the feature representation capability of mobile networks for RGB-D SOD.
With IDR and CPR incorporated, methodnameperforms favorably against sArt methods on seven challenging RGB-D SOD datasets.
arXiv Detail & Related papers (2020-12-24T04:36:42Z) - Is Depth Really Necessary for Salient Object Detection? [50.10888549190576]
We make the first attempt in realizing an unified depth-aware framework with only RGB information as input for inference.
Not only surpasses the state-of-the-art performances on five public RGB SOD benchmarks, but also surpasses the RGBD-based methods on five benchmarks by a large margin.
arXiv Detail & Related papers (2020-05-30T13:40:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.