Generalizable Facial Expression Recognition
- URL: http://arxiv.org/abs/2408.10614v1
- Date: Tue, 20 Aug 2024 07:48:45 GMT
- Title: Generalizable Facial Expression Recognition
- Authors: Yuhang Zhang, Xiuqi Zheng, Chenyi Liang, Jiani Hu, Weihong Deng,
- Abstract summary: SOTA facial expression recognition (FER) methods fail on test sets with domain gaps with the train set.
Recent domain adaptation FER methods need to acquire labeled or unlabeled samples of target domains to fine-tune the FER model.
This paper aims to improve the zero-shot generalization ability of FER methods on different unseen test sets using only one train set.
- Score: 41.639746139849564
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: SOTA facial expression recognition (FER) methods fail on test sets that have domain gaps with the train set. Recent domain adaptation FER methods need to acquire labeled or unlabeled samples of target domains to fine-tune the FER model, which might be infeasible in real-world deployment. In this paper, we aim to improve the zero-shot generalization ability of FER methods on different unseen test sets using only one train set. Inspired by how humans first detect faces and then select expression features, we propose a novel FER pipeline to extract expression-related features from any given face images. Our method is based on the generalizable face features extracted by large models like CLIP. However, it is non-trivial to adapt the general features of CLIP for specific tasks like FER. To preserve the generalization ability of CLIP and the high precision of the FER model, we design a novel approach that learns sigmoid masks based on the fixed CLIP face features to extract expression features. To further improve the generalization ability on unseen test sets, we separate the channels of the learned masked features according to the expression classes to directly generate logits and avoid using the FC layer to reduce overfitting. We also introduce a channel-diverse loss to make the learned masks separated. Extensive experiments on five different FER datasets verify that our method outperforms SOTA FER methods by large margins. Code is available in https://github.com/zyh-uaiaaaa/Generalizable-FER.
Related papers
- Effort: Efficient Orthogonal Modeling for Generalizable AI-Generated Image Detection [66.16595174895802]
Existing AI-generated image (AIGI) detection methods often suffer from limited generalization performance.
In this paper, we identify a crucial yet previously overlooked asymmetry phenomenon in AIGI detection.
arXiv Detail & Related papers (2024-11-23T19:10:32Z) - Stacking Brick by Brick: Aligned Feature Isolation for Incremental Face Forgery Detection [18.46382766430443]
A naively trained IFFD model is prone to catastrophic forgetting when new forgeries are integrated.
We propose a Latent-space Incremental Detector (LID) that leverages SUR data to isolate and align distributions.
For evaluation, we construct a more advanced and comprehensive benchmark tailored for IFFD.
arXiv Detail & Related papers (2024-11-18T09:18:36Z) - Bridge the Points: Graph-based Few-shot Segment Anything Semantically [79.1519244940518]
Recent advancements in pre-training techniques have enhanced the capabilities of vision foundation models.
Recent studies extend the SAM to Few-shot Semantic segmentation (FSS)
We propose a simple yet effective approach based on graph analysis.
arXiv Detail & Related papers (2024-10-09T15:02:28Z) - Spatial Action Unit Cues for Interpretable Deep Facial Expression Recognition [55.97779732051921]
State-of-the-art classifiers for facial expression recognition (FER) lack interpretability, an important feature for end-users.
A new learning strategy is proposed to explicitly incorporate AU cues into classifier training, allowing to train deep interpretable models.
Our new strategy is generic, and can be applied to any deep CNN- or transformer-based classifier without requiring any architectural change or significant additional training time.
arXiv Detail & Related papers (2024-10-01T10:42:55Z) - MFCLIP: Multi-modal Fine-grained CLIP for Generalizable Diffusion Face Forgery Detection [64.29452783056253]
The rapid development of photo-realistic face generation methods has raised significant concerns in society and academia.
Although existing approaches mainly capture face forgery patterns using image modality, other modalities like fine-grained noises and texts are not fully explored.
We propose a novel multi-modal fine-grained CLIP (MFCLIP) model, which mines comprehensive and fine-grained forgery traces across image-noise modalities.
arXiv Detail & Related papers (2024-09-15T13:08:59Z) - Latent-OFER: Detect, Mask, and Reconstruct with Latent Vectors for
Occluded Facial Expression Recognition [0.0]
The proposed method can detect occluded parts of the face as if they were unoccluded, and recognize them, improving FER accuracy.
It involves three steps: First, the vision transformer (ViT)-based occlusion patch detector masks the occluded position by training only latent vectors from the unoccluded patches.
Second, the hybrid reconstruction network generates the masking position as a complete image using the ViT and convolutional neural network (CNN)
Last, the expression-relevant latent vector extractor retrieves and uses expression-related information from all latent vectors by applying a CNN-based class activation map
arXiv Detail & Related papers (2023-07-21T07:56:32Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - Loss Function Entropy Regularization for Diverse Decision Boundaries [0.0]
Loss Function Entropy Regularization (LFER), are regularization terms to be added upon the pre-training and contrastive learning objective functions.
We show that LFER can produce an ensemble where each have accuracy comparable to the state-of-the-art, yet have varied latent decision boundaries.
arXiv Detail & Related papers (2022-04-30T10:16:41Z) - Face Presentation Attack Detection using Taskonomy Feature [26.343512092423985]
Presentation Attack Detection (PAD) methods are critical to ensure the security of Face Recognition Systems (FRSs)
Existing PAD methods are highly dependent on the limited training set and cannot generalize well to unknown PAs.
We propose to apply taskonomy (task taxonomy) from other face-related tasks to solve face PAD.
arXiv Detail & Related papers (2021-11-22T08:35:26Z) - BioMetricNet: deep unconstrained face verification through learning of
metrics regularized onto Gaussian distributions [25.00475462213752]
We present BioMetricNet, a novel framework for deep unconstrained face verification.
The proposed approach does not impose any specific metric on facial features.
It shapes the decision space by learning a latent representation in which matching and non-matching pairs are mapped onto clearly separated and well-behaved target distributions.
arXiv Detail & Related papers (2020-08-13T17:22:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.