Two-Level Attention-based Fusion Learning for RGB-D Face Recognition
- URL: http://arxiv.org/abs/2003.00168v3
- Date: Sun, 18 Oct 2020 10:20:01 GMT
- Title: Two-Level Attention-based Fusion Learning for RGB-D Face Recognition
- Authors: Hardik Uppal, Alireza Sepas-Moghaddam, Michael Greenspan and Ali
Etemad
- Abstract summary: A novel attention aware method is proposed to fuse two image modalities, RGB and depth, for enhanced RGB-D facial recognition.
The proposed method first extracts features from both modalities using a convolutional feature extractor.
These features are then fused using a two-layer attention mechanism.
- Score: 21.735238213921804
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With recent advances in RGB-D sensing technologies as well as improvements in
machine learning and fusion techniques, RGB-D facial recognition has become an
active area of research. A novel attention aware method is proposed to fuse two
image modalities, RGB and depth, for enhanced RGB-D facial recognition. The
proposed method first extracts features from both modalities using a
convolutional feature extractor. These features are then fused using a
two-layer attention mechanism. The first layer focuses on the fused feature
maps generated by the feature extractor, exploiting the relationship between
feature maps using LSTM recurrent learning. The second layer focuses on the
spatial features of those maps using convolution. The training database is
preprocessed and augmented through a set of geometric transformations, and the
learning process is further aided using transfer learning from a pure 2D RGB
image training process. Comparative evaluations demonstrate that the proposed
method outperforms other state-of-the-art approaches, including both
traditional and deep neural network-based methods, on the challenging
CurtinFaces and IIIT-D RGB-D benchmark databases, achieving classification
accuracies over 98.2% and 99.3% respectively. The proposed attention mechanism
is also compared with other attention mechanisms, demonstrating more accurate
results.
Related papers
- Multispectral Texture Synthesis using RGB Convolutional Neural Networks [2.3213238782019316]
State-of-the-art RGB texture synthesis algorithms rely on style distances that are computed through statistics of deep features.
We propose two solutions to extend these methods to multispectral imaging.
arXiv Detail & Related papers (2024-10-21T13:49:54Z) - UGAD: Universal Generative AI Detector utilizing Frequency Fingerprints [18.47018538990973]
Our study introduces a novel multi-modal approach to detect AI-generated images.
Our approach significantly enhances the accuracy of differentiating between real and AI-generated images.
arXiv Detail & Related papers (2024-09-12T10:29:37Z) - Confidence-Aware RGB-D Face Recognition via Virtual Depth Synthesis [48.59382455101753]
2D face recognition encounters challenges in unconstrained environments due to varying illumination, occlusion, and pose.
Recent studies focus on RGB-D face recognition to improve robustness by incorporating depth information.
In this work, we first construct a diverse depth dataset generated by 3D Morphable Models for depth model pre-training.
Then, we propose a domain-independent pre-training framework that utilizes readily available pre-trained RGB and depth models to separately perform face recognition without needing additional paired data for retraining.
arXiv Detail & Related papers (2024-03-11T09:12:24Z) - Multi-Modal Hybrid Learning and Sequential Training for RGB-T Saliency
Detection [10.589062261564631]
RGB-T saliency detection has emerged as an important computer vision task, identifying conspicuous objects in challenging scenes such as dark environments.
Existing methods neglect the characteristics of cross-modal features and rely solely on network structures to fuse RGB and thermal features.
We first propose a Multi-Modal Hybrid loss (MMHL) that comprises supervised and self-supervised loss functions.
arXiv Detail & Related papers (2023-09-13T20:47:29Z) - Two Approaches to Supervised Image Segmentation [55.616364225463066]
The present work develops comparison experiments between deep learning and multiset neurons approaches.
The deep learning approach confirmed its potential for performing image segmentation.
The alternative multiset methodology allowed for enhanced accuracy while requiring little computational resources.
arXiv Detail & Related papers (2023-07-19T16:42:52Z) - HODINet: High-Order Discrepant Interaction Network for RGB-D Salient
Object Detection [4.007827908611563]
RGB-D salient object detection (SOD) aims to detect the prominent regions by jointly modeling RGB and depth information.
Most RGB-D SOD methods apply the same type of backbones and fusion modules to identically learn the multimodality and multistage features.
In this paper, we propose a high-order discrepant interaction network (HODINet) for RGB-D SOD.
arXiv Detail & Related papers (2023-07-03T11:56:21Z) - Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient
Object Detection [67.33924278729903]
In this work, we propose Dual Swin-Transformer based Mutual Interactive Network.
We adopt Swin-Transformer as the feature extractor for both RGB and depth modality to model the long-range dependencies in visual inputs.
Comprehensive experiments on five standard RGB-D SOD benchmark datasets demonstrate the superiority of the proposed DTMINet method.
arXiv Detail & Related papers (2022-06-07T08:35:41Z) - Learning Geodesic-Aware Local Features from RGB-D Images [8.115075181267109]
We propose a new approach to compute descriptors from RGB-D images that are invariant to non-rigid deformations.
Our proposed description strategies are grounded on the key idea of learning feature representations on undistorted local image patches.
In different experiments using real and publicly available RGB-D data benchmarks, they consistently outperforms state-of-the-art handcrafted and learning-based image and RGB-D descriptors.
arXiv Detail & Related papers (2022-03-22T19:52:49Z) - Cross-modality Discrepant Interaction Network for RGB-D Salient Object
Detection [78.47767202232298]
We propose a novel Cross-modality Discrepant Interaction Network (CDINet) for RGB-D SOD.
Two components are designed to implement the effective cross-modality interaction.
Our network outperforms $15$ state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-08-04T11:24:42Z) - Generalizing Face Forgery Detection with High-frequency Features [63.33397573649408]
Current CNN-based detectors tend to overfit to method-specific color textures and thus fail to generalize.
We propose to utilize the high-frequency noises for face forgery detection.
The first is the multi-scale high-frequency feature extraction module that extracts high-frequency noises at multiple scales.
The second is the residual-guided spatial attention module that guides the low-level RGB feature extractor to concentrate more on forgery traces from a new perspective.
arXiv Detail & Related papers (2021-03-23T08:19:21Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.