Latent-OFER: Detect, Mask, and Reconstruct with Latent Vectors for
Occluded Facial Expression Recognition
- URL: http://arxiv.org/abs/2307.11404v1
- Date: Fri, 21 Jul 2023 07:56:32 GMT
- Title: Latent-OFER: Detect, Mask, and Reconstruct with Latent Vectors for
Occluded Facial Expression Recognition
- Authors: Isack Lee, Eungi Lee, Seok Bong Yoo
- Abstract summary: The proposed method can detect occluded parts of the face as if they were unoccluded, and recognize them, improving FER accuracy.
It involves three steps: First, the vision transformer (ViT)-based occlusion patch detector masks the occluded position by training only latent vectors from the unoccluded patches.
Second, the hybrid reconstruction network generates the masking position as a complete image using the ViT and convolutional neural network (CNN)
Last, the expression-relevant latent vector extractor retrieves and uses expression-related information from all latent vectors by applying a CNN-based class activation map
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Most research on facial expression recognition (FER) is conducted in highly
controlled environments, but its performance is often unacceptable when applied
to real-world situations. This is because when unexpected objects occlude the
face, the FER network faces difficulties extracting facial features and
accurately predicting facial expressions. Therefore, occluded FER (OFER) is a
challenging problem. Previous studies on occlusion-aware FER have typically
required fully annotated facial images for training. However, collecting facial
images with various occlusions and expression annotations is time-consuming and
expensive. Latent-OFER, the proposed method, can detect occlusions, restore
occluded parts of the face as if they were unoccluded, and recognize them,
improving FER accuracy. This approach involves three steps: First, the vision
transformer (ViT)-based occlusion patch detector masks the occluded position by
training only latent vectors from the unoccluded patches using the support
vector data description algorithm. Second, the hybrid reconstruction network
generates the masking position as a complete image using the ViT and
convolutional neural network (CNN). Last, the expression-relevant latent vector
extractor retrieves and uses expression-related information from all latent
vectors by applying a CNN-based class activation map. This mechanism has a
significant advantage in preventing performance degradation from occlusion by
unseen objects. The experimental results on several databases demonstrate the
superiority of the proposed method over state-of-the-art methods.
Related papers
- DiffusionFake: Enhancing Generalization in Deepfake Detection via Guided Stable Diffusion [94.46904504076124]
Deepfake technology has made face swapping highly realistic, raising concerns about the malicious use of fabricated facial content.
Existing methods often struggle to generalize to unseen domains due to the diverse nature of facial manipulations.
We introduce DiffusionFake, a novel framework that reverses the generative process of face forgeries to enhance the generalization of detection models.
arXiv Detail & Related papers (2024-10-06T06:22:43Z) - UniForensics: Face Forgery Detection via General Facial Representation [60.5421627990707]
High-level semantic features are less susceptible to perturbations and not limited to forgery-specific artifacts, thus having stronger generalization.
We introduce UniForensics, a novel deepfake detection framework that leverages a transformer-based video network, with a meta-functional face classification for enriched facial representation.
arXiv Detail & Related papers (2024-07-26T20:51:54Z) - Seeing through the Mask: Multi-task Generative Mask Decoupling Face
Recognition [47.248075664420874]
Current general face recognition system suffers from serious performance degradation when encountering occluded scenes.
This paper proposes a Multi-task gEnerative mask dEcoupling face Recognition (MEER) network to jointly handle these two tasks.
We first present a novel mask decoupling module to disentangle mask and identity information, which makes the network obtain purer identity features from visible facial components.
arXiv Detail & Related papers (2023-11-20T03:23:03Z) - COMICS: End-to-end Bi-grained Contrastive Learning for Multi-face Forgery Detection [56.7599217711363]
Face forgery recognition methods can only process one face at a time.
Most face forgery recognition methods can only process one face at a time.
We propose COMICS, an end-to-end framework for multi-face forgery detection.
arXiv Detail & Related papers (2023-08-03T03:37:13Z) - Mask or Non-Mask? Robust Face Mask Detector via Triplet-Consistency
Representation Learning [23.062034116854875]
In the absence of vaccines or medicines to stop COVID-19, one of the effective methods to slow the spread of the coronavirus is to wear a face mask.
To mandate the use of face masks or coverings in public areas, additional human resources are required, which is tedious and attention-intensive.
We propose a face mask detection framework that uses the context attention module to enable the effective attention of the feed-forward convolution neural network.
arXiv Detail & Related papers (2021-10-01T16:44:06Z) - End2End Occluded Face Recognition by Masking Corrupted Features [82.27588990277192]
State-of-the-art general face recognition models do not generalize well to occluded face images.
This paper presents a novel face recognition method that is robust to occlusions based on a single end-to-end deep neural network.
Our approach, named FROM (Face Recognition with Occlusion Masks), learns to discover the corrupted features from the deep convolutional neural networks, and clean them by the dynamically learned masks.
arXiv Detail & Related papers (2021-08-21T09:08:41Z) - Occlusion-Adaptive Deep Network for Robust Facial Expression Recognition [56.11054589916299]
We propose a landmark-guided attention branch to find and discard corrupted features from occluded regions.
An attention map is first generated to indicate if a specific facial part is occluded and guide our model to attend to non-occluded regions.
This results in more diverse and discriminative features, enabling the expression recognition system to recover even though the face is partially occluded.
arXiv Detail & Related papers (2020-05-12T20:42:55Z) - Fake face detection via adaptive manipulation traces extraction network [9.892936175042939]
We propose an adaptive manipulation traces extraction network (AMTEN) to suppress image content and highlight manipulation traces.
AMTEN exploits an adaptive convolution layer to predict manipulation traces in the image, which are reused in subsequent layers to maximize manipulation artifacts.
When detecting fake face images generated by various FIM techniques, AMTENnet achieves an average accuracy up to 98.52%, which outperforms the state-of-the-art works.
arXiv Detail & Related papers (2020-05-11T09:16:39Z) - Deep Spatial Gradient and Temporal Depth Learning for Face Anti-spoofing [61.82466976737915]
Depth supervised learning has been proven as one of the most effective methods for face anti-spoofing.
We propose a new approach to detect presentation attacks from multiple frames based on two insights.
The proposed approach achieves state-of-the-art results on five benchmark datasets.
arXiv Detail & Related papers (2020-03-18T06:11:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.