Hypergraph-Guided Disentangled Spectrum Transformer Networks for
Near-Infrared Facial Expression Recognition
- URL: http://arxiv.org/abs/2312.05907v1
- Date: Sun, 10 Dec 2023 15:15:50 GMT
- Title: Hypergraph-Guided Disentangled Spectrum Transformer Networks for
Near-Infrared Facial Expression Recognition
- Authors: Bingjun Luo, Haowen Wang, Jinpeng Wang, Junjie Zhu, Xibin Zhao, Yue
Gao
- Abstract summary: We give the first attempt to deep NIR facial expression recognition and proposed a novel method called near-infrared facial expression transformer (NFER-Former)
NFER-Former disentangles the expression information and spectrum information from the input image, so that the expression features can be extracted without the interference of spectrum variation.
We have constructed a large NIR-VIS Facial Expression dataset that includes 360 subjects to better validate the efficiency of NFER-Former.
- Score: 31.783671943393344
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the strong robusticity on illumination variations, near-infrared (NIR)
can be an effective and essential complement to visible (VIS) facial expression
recognition in low lighting or complete darkness conditions. However, facial
expression recognition (FER) from NIR images presents more challenging problem
than traditional FER due to the limitations imposed by the data scale and the
difficulty of extracting discriminative features from incomplete visible
lighting contents. In this paper, we give the first attempt to deep NIR facial
expression recognition and proposed a novel method called near-infrared facial
expression transformer (NFER-Former). Specifically, to make full use of the
abundant label information in the field of VIS, we introduce a Self-Attention
Orthogonal Decomposition mechanism that disentangles the expression information
and spectrum information from the input image, so that the expression features
can be extracted without the interference of spectrum variation. We also
propose a Hypergraph-Guided Feature Embedding method that models some key
facial behaviors and learns the structure of the complex correlations between
them, thereby alleviating the interference of inter-class similarity.
Additionally, we have constructed a large NIR-VIS Facial Expression dataset
that includes 360 subjects to better validate the efficiency of NFER-Former.
Extensive experiments and ablation studies show that NFER-Former significantly
improves the performance of NIR FER and achieves state-of-the-art results on
the only two available NIR FER datasets, Oulu-CASIA and Large-HFE.
Related papers
- Frequency Domain Modality-invariant Feature Learning for
Visible-infrared Person Re-Identification [79.9402521412239]
We propose a novel Frequency Domain modality-invariant feature learning framework (FDMNet) to reduce modality discrepancy from the frequency domain perspective.
Our framework introduces two novel modules, namely the Instance-Adaptive Amplitude Filter (IAF) and the Phrase-Preserving Normalization (PPNorm)
arXiv Detail & Related papers (2024-01-03T17:11:27Z) - Multi-Energy Guided Image Translation with Stochastic Differential
Equations for Near-Infrared Facial Expression Recognition [32.34873680472637]
We present NIR-SDE, that transforms face expression between heterogeneous modalities overfitting on small-scale NIR data.
NFER-SDE significantly improves the performance of NIR FER and achieves state-of-the-art results on the only two available NIR FER datasets.
arXiv Detail & Related papers (2023-12-10T15:17:42Z) - Rethinking the Domain Gap in Near-infrared Face Recognition [65.7871950460781]
Heterogeneous face recognition (HFR) involves the intricate task of matching face images across the visual domains of visible (VIS) and near-infrared (NIR)
Much of the existing literature on HFR identifies the domain gap as a primary challenge and directs efforts towards bridging it at either the input or feature level.
We observe that large neural networks, unlike their smaller counterparts, when pre-trained on large scale homogeneous VIS data, demonstrate exceptional zero-shot performance in HFR.
arXiv Detail & Related papers (2023-12-01T14:43:28Z) - Flare-Aware Cross-modal Enhancement Network for Multi-spectral Vehicle
Re-identification [29.48387524901101]
In harsh environments, the discnative cues in RGB and NIR modalities are often lost due to strong flares from vehicle lamps or sunlight.
We propose a Flare-Aware Cross-modal Enhancement Network that adaptively restores flare-corrupted RGB and NIR features with guidance from the flareimmunized thermal infrared spectrum.
arXiv Detail & Related papers (2023-05-23T04:04:24Z) - Denoising Diffusion Models for Plug-and-Play Image Restoration [135.6359475784627]
This paper proposes DiffPIR, which integrates the traditional plug-and-play method into the diffusion sampling framework.
Compared to plug-and-play IR methods that rely on discriminative Gaussian denoisers, DiffPIR is expected to inherit the generative ability of diffusion models.
arXiv Detail & Related papers (2023-05-15T20:24:38Z) - Physically-Based Face Rendering for NIR-VIS Face Recognition [165.54414962403555]
Near infrared (NIR) to Visible (VIS) face matching is challenging due to the significant domain gaps.
We propose a novel method for paired NIR-VIS facial image generation.
To facilitate the identity feature learning, we propose an IDentity-based Maximum Mean Discrepancy (ID-MMD) loss.
arXiv Detail & Related papers (2022-11-11T18:48:16Z) - Generating near-infrared facial expression datasets with dimensional
affect labels [2.367786892039871]
We present two complementary data augmentation methods to create NIR image datasets with dimensional emotion labels.
Our experiments show that these generated NIR datasets are comparable to existing datasets in terms of data quality and baseline prediction performance.
arXiv Detail & Related papers (2022-06-28T11:06:32Z) - Unsupervised Misaligned Infrared and Visible Image Fusion via
Cross-Modality Image Generation and Registration [59.02821429555375]
We present a robust cross-modality generation-registration paradigm for unsupervised misaligned infrared and visible image fusion.
To better fuse the registered infrared images and visible images, we present a feature Interaction Fusion Module (IFM)
arXiv Detail & Related papers (2022-05-24T07:51:57Z) - A Bidirectional Conversion Network for Cross-Spectral Face Recognition [1.9766522384767227]
Cross-spectral face recognition is challenging due to the dramatic difference between the visible light and IR imageries.
This paper proposes a framework of bidirectional cross-spectral conversion (BCSC-GAN) between the heterogeneous face images.
The network reduces the cross-spectral recognition problem into an intra-spectral problem, and improves performance by fusing bidirectional information.
arXiv Detail & Related papers (2022-05-03T16:20:10Z) - A Synthesis-Based Approach for Thermal-to-Visible Face Verification [105.63410428506536]
This paper presents an algorithm that achieves state-of-the-art performance on the ARL-VTF and TUFTS multi-spectral face datasets.
We also present MILAB-VTF(B), a challenging multi-spectral face dataset composed of paired thermal and visible videos.
arXiv Detail & Related papers (2021-08-21T17:59:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.