Latent-OFER: Detect, Mask, and Reconstruct with Latent Vectors for
Occluded Facial Expression Recognition
- URL: http://arxiv.org/abs/2307.11404v1
- Date: Fri, 21 Jul 2023 07:56:32 GMT
- Title: Latent-OFER: Detect, Mask, and Reconstruct with Latent Vectors for
Occluded Facial Expression Recognition
- Authors: Isack Lee, Eungi Lee, Seok Bong Yoo
- Abstract summary: The proposed method can detect occluded parts of the face as if they were unoccluded, and recognize them, improving FER accuracy.
It involves three steps: First, the vision transformer (ViT)-based occlusion patch detector masks the occluded position by training only latent vectors from the unoccluded patches.
Second, the hybrid reconstruction network generates the masking position as a complete image using the ViT and convolutional neural network (CNN)
Last, the expression-relevant latent vector extractor retrieves and uses expression-related information from all latent vectors by applying a CNN-based class activation map
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Most research on facial expression recognition (FER) is conducted in highly
controlled environments, but its performance is often unacceptable when applied
to real-world situations. This is because when unexpected objects occlude the
face, the FER network faces difficulties extracting facial features and
accurately predicting facial expressions. Therefore, occluded FER (OFER) is a
challenging problem. Previous studies on occlusion-aware FER have typically
required fully annotated facial images for training. However, collecting facial
images with various occlusions and expression annotations is time-consuming and
expensive. Latent-OFER, the proposed method, can detect occlusions, restore
occluded parts of the face as if they were unoccluded, and recognize them,
improving FER accuracy. This approach involves three steps: First, the vision
transformer (ViT)-based occlusion patch detector masks the occluded position by
training only latent vectors from the unoccluded patches using the support
vector data description algorithm. Second, the hybrid reconstruction network
generates the masking position as a complete image using the ViT and
convolutional neural network (CNN). Last, the expression-relevant latent vector
extractor retrieves and uses expression-related information from all latent
vectors by applying a CNN-based class activation map. This mechanism has a
significant advantage in preventing performance degradation from occlusion by
unseen objects. The experimental results on several databases demonstrate the
superiority of the proposed method over state-of-the-art methods.
Related papers
- Seeing through the Mask: Multi-task Generative Mask Decoupling Face
Recognition [47.248075664420874]
Current general face recognition system suffers from serious performance degradation when encountering occluded scenes.
This paper proposes a Multi-task gEnerative mask dEcoupling face Recognition (MEER) network to jointly handle these two tasks.
We first present a novel mask decoupling module to disentangle mask and identity information, which makes the network obtain purer identity features from visible facial components.
arXiv Detail & Related papers (2023-11-20T03:23:03Z) - COMICS: End-to-end Bi-grained Contrastive Learning for Multi-face Forgery Detection [56.7599217711363]
Face forgery recognition methods can only process one face at a time.
Most face forgery recognition methods can only process one face at a time.
We propose COMICS, an end-to-end framework for multi-face forgery detection.
arXiv Detail & Related papers (2023-08-03T03:37:13Z) - Mask or Non-Mask? Robust Face Mask Detector via Triplet-Consistency
Representation Learning [23.062034116854875]
In the absence of vaccines or medicines to stop COVID-19, one of the effective methods to slow the spread of the coronavirus is to wear a face mask.
To mandate the use of face masks or coverings in public areas, additional human resources are required, which is tedious and attention-intensive.
We propose a face mask detection framework that uses the context attention module to enable the effective attention of the feed-forward convolution neural network.
arXiv Detail & Related papers (2021-10-01T16:44:06Z) - End2End Occluded Face Recognition by Masking Corrupted Features [82.27588990277192]
State-of-the-art general face recognition models do not generalize well to occluded face images.
This paper presents a novel face recognition method that is robust to occlusions based on a single end-to-end deep neural network.
Our approach, named FROM (Face Recognition with Occlusion Masks), learns to discover the corrupted features from the deep convolutional neural networks, and clean them by the dynamically learned masks.
arXiv Detail & Related papers (2021-08-21T09:08:41Z) - Mutual Information Regularized Identity-aware Facial
ExpressionRecognition in Compressed Video [27.602648102881535]
We propose a novel collaborative min-min game for mutual information (MI) minimization in latent space.
We do not need the identity label or multiple expression samples from the same person for identity elimination.
Our solution can achieve comparable or better performance than the recent decoded image-based methods.
arXiv Detail & Related papers (2020-10-20T21:42:18Z) - Occlusion-Adaptive Deep Network for Robust Facial Expression Recognition [56.11054589916299]
We propose a landmark-guided attention branch to find and discard corrupted features from occluded regions.
An attention map is first generated to indicate if a specific facial part is occluded and guide our model to attend to non-occluded regions.
This results in more diverse and discriminative features, enabling the expression recognition system to recover even though the face is partially occluded.
arXiv Detail & Related papers (2020-05-12T20:42:55Z) - Fake face detection via adaptive manipulation traces extraction network [9.892936175042939]
We propose an adaptive manipulation traces extraction network (AMTEN) to suppress image content and highlight manipulation traces.
AMTEN exploits an adaptive convolution layer to predict manipulation traces in the image, which are reused in subsequent layers to maximize manipulation artifacts.
When detecting fake face images generated by various FIM techniques, AMTENnet achieves an average accuracy up to 98.52%, which outperforms the state-of-the-art works.
arXiv Detail & Related papers (2020-05-11T09:16:39Z) - Deep Spatial Gradient and Temporal Depth Learning for Face Anti-spoofing [61.82466976737915]
Depth supervised learning has been proven as one of the most effective methods for face anti-spoofing.
We propose a new approach to detect presentation attacks from multiple frames based on two insights.
The proposed approach achieves state-of-the-art results on five benchmark datasets.
arXiv Detail & Related papers (2020-03-18T06:11:20Z) - Face Anti-Spoofing by Learning Polarization Cues in a Real-World
Scenario [50.36920272392624]
Face anti-spoofing is the key to preventing security breaches in biometric recognition applications.
Deep learning method using RGB and infrared images demands a large amount of training data for new attacks.
We present a face anti-spoofing method in a real-world scenario by automatic learning the physical characteristics in polarization images of a real face.
arXiv Detail & Related papers (2020-03-18T03:04:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.