Related papers: Latent-OFER: Detect, Mask, and Reconstruct with Latent Vectors for Occluded Facial Expression Recognition

Latent-OFER: Detect, Mask, and Reconstruct with Latent Vectors for Occluded Facial Expression Recognition

URL: http://arxiv.org/abs/2307.11404v1
Date: Fri, 21 Jul 2023 07:56:32 GMT
Title: Latent-OFER: Detect, Mask, and Reconstruct with Latent Vectors for Occluded Facial Expression Recognition
Authors: Isack Lee, Eungi Lee, Seok Bong Yoo
Abstract summary: The proposed method can detect occluded parts of the face as if they were unoccluded, and recognize them, improving FER accuracy. It involves three steps: First, the vision transformer (ViT)-based occlusion patch detector masks the occluded position by training only latent vectors from the unoccluded patches. Second, the hybrid reconstruction network generates the masking position as a complete image using the ViT and convolutional neural network (CNN) Last, the expression-relevant latent vector extractor retrieves and uses expression-related information from all latent vectors by applying a CNN-based class activation map
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Most research on facial expression recognition (FER) is conducted in highly controlled environments, but its performance is often unacceptable when applied to real-world situations. This is because when unexpected objects occlude the face, the FER network faces difficulties extracting facial features and accurately predicting facial expressions. Therefore, occluded FER (OFER) is a challenging problem. Previous studies on occlusion-aware FER have typically required fully annotated facial images for training. However, collecting facial images with various occlusions and expression annotations is time-consuming and expensive. Latent-OFER, the proposed method, can detect occlusions, restore occluded parts of the face as if they were unoccluded, and recognize them, improving FER accuracy. This approach involves three steps: First, the vision transformer (ViT)-based occlusion patch detector masks the occluded position by training only latent vectors from the unoccluded patches using the support vector data description algorithm. Second, the hybrid reconstruction network generates the masking position as a complete image using the ViT and convolutional neural network (CNN). Last, the expression-relevant latent vector extractor retrieves and uses expression-related information from all latent vectors by applying a CNN-based class activation map. This mechanism has a significant advantage in preventing performance degradation from occlusion by unseen objects. The experimental results on several databases demonstrate the superiority of the proposed method over state-of-the-art methods.

Related papers

S$^3$POT: Contrast-Driven Face Occlusion Segmentation via Self-Supervised Prompt Learning [46.05577414378133]
We present S$3$POT, a contrast-driven framework synergizing face generation with self-supervised spatial prompting.<n>In particular, S$3$POT consists of three modules: Reference Generation, Feature enhancement, and Prompt Selection.<n>Experiments on a dedicatedly collected dataset demonstrate S$3$POT's superior performance and the effectiveness of each module.
arXiv Detail & Related papers (2026-01-31T10:05:13Z)
ExposeAnyone: Personalized Audio-to-Expression Diffusion Models Are Robust Zero-Shot Face Forgery Detectors [58.45131932883374]
We propose a fully self-supervised approach to detect deepfakes in videos.<n>Our model computes the identity distances between suspected videos and personalized subjects via diffusion reconstruction errors.<n>Our method is highly robust to corruptions such as blur and compression, highlighting the applicability in real-world face forgery detection.
arXiv Detail & Related papers (2026-01-05T18:59:54Z)
OSDFace: One-Step Diffusion Model for Face Restoration [72.5045389847792]
Diffusion models have demonstrated impressive performance in face restoration. We propose OSDFace, a novel one-step diffusion model for face restoration. Results demonstrate that OSDFace surpasses current state-of-the-art (SOTA) methods in both visual quality and quantitative metrics.
arXiv Detail & Related papers (2024-11-26T07:07:48Z)
DiffusionFake: Enhancing Generalization in Deepfake Detection via Guided Stable Diffusion [94.46904504076124]
Deepfake technology has made face swapping highly realistic, raising concerns about the malicious use of fabricated facial content. Existing methods often struggle to generalize to unseen domains due to the diverse nature of facial manipulations. We introduce DiffusionFake, a novel framework that reverses the generative process of face forgeries to enhance the generalization of detection models.
arXiv Detail & Related papers (2024-10-06T06:22:43Z)
UniForensics: Face Forgery Detection via General Facial Representation [60.5421627990707]
High-level semantic features are less susceptible to perturbations and not limited to forgery-specific artifacts, thus having stronger generalization. We introduce UniForensics, a novel deepfake detection framework that leverages a transformer-based video network, with a meta-functional face classification for enriched facial representation.
arXiv Detail & Related papers (2024-07-26T20:51:54Z)
Seeing through the Mask: Multi-task Generative Mask Decoupling Face Recognition [47.248075664420874]
Current general face recognition system suffers from serious performance degradation when encountering occluded scenes. This paper proposes a Multi-task gEnerative mask dEcoupling face Recognition (MEER) network to jointly handle these two tasks. We first present a novel mask decoupling module to disentangle mask and identity information, which makes the network obtain purer identity features from visible facial components.
arXiv Detail & Related papers (2023-11-20T03:23:03Z)
COMICS: End-to-end Bi-grained Contrastive Learning for Multi-face Forgery Detection [56.7599217711363]
Face forgery recognition methods can only process one face at a time. Most face forgery recognition methods can only process one face at a time. We propose COMICS, an end-to-end framework for multi-face forgery detection.
arXiv Detail & Related papers (2023-08-03T03:37:13Z)
Mask or Non-Mask? Robust Face Mask Detector via Triplet-Consistency Representation Learning [23.062034116854875]
In the absence of vaccines or medicines to stop COVID-19, one of the effective methods to slow the spread of the coronavirus is to wear a face mask. To mandate the use of face masks or coverings in public areas, additional human resources are required, which is tedious and attention-intensive. We propose a face mask detection framework that uses the context attention module to enable the effective attention of the feed-forward convolution neural network.
arXiv Detail & Related papers (2021-10-01T16:44:06Z)
End2End Occluded Face Recognition by Masking Corrupted Features [82.27588990277192]
State-of-the-art general face recognition models do not generalize well to occluded face images. This paper presents a novel face recognition method that is robust to occlusions based on a single end-to-end deep neural network. Our approach, named FROM (Face Recognition with Occlusion Masks), learns to discover the corrupted features from the deep convolutional neural networks, and clean them by the dynamically learned masks.
arXiv Detail & Related papers (2021-08-21T09:08:41Z)
Occlusion-Adaptive Deep Network for Robust Facial Expression Recognition [56.11054589916299]
We propose a landmark-guided attention branch to find and discard corrupted features from occluded regions. An attention map is first generated to indicate if a specific facial part is occluded and guide our model to attend to non-occluded regions. This results in more diverse and discriminative features, enabling the expression recognition system to recover even though the face is partially occluded.
arXiv Detail & Related papers (2020-05-12T20:42:55Z)
Fake face detection via adaptive manipulation traces extraction network [9.892936175042939]
We propose an adaptive manipulation traces extraction network (AMTEN) to suppress image content and highlight manipulation traces. AMTEN exploits an adaptive convolution layer to predict manipulation traces in the image, which are reused in subsequent layers to maximize manipulation artifacts. When detecting fake face images generated by various FIM techniques, AMTENnet achieves an average accuracy up to 98.52%, which outperforms the state-of-the-art works.
arXiv Detail & Related papers (2020-05-11T09:16:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.