Related papers: Generalizable Facial Expression Recognition

Generalizable Facial Expression Recognition

URL: http://arxiv.org/abs/2408.10614v1
Date: Tue, 20 Aug 2024 07:48:45 GMT
Title: Generalizable Facial Expression Recognition
Authors: Yuhang Zhang, Xiuqi Zheng, Chenyi Liang, Jiani Hu, Weihong Deng,
Abstract summary: SOTA facial expression recognition (FER) methods fail on test sets with domain gaps with the train set. Recent domain adaptation FER methods need to acquire labeled or unlabeled samples of target domains to fine-tune the FER model. This paper aims to improve the zero-shot generalization ability of FER methods on different unseen test sets using only one train set.
Score: 41.639746139849564
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: SOTA facial expression recognition (FER) methods fail on test sets that have domain gaps with the train set. Recent domain adaptation FER methods need to acquire labeled or unlabeled samples of target domains to fine-tune the FER model, which might be infeasible in real-world deployment. In this paper, we aim to improve the zero-shot generalization ability of FER methods on different unseen test sets using only one train set. Inspired by how humans first detect faces and then select expression features, we propose a novel FER pipeline to extract expression-related features from any given face images. Our method is based on the generalizable face features extracted by large models like CLIP. However, it is non-trivial to adapt the general features of CLIP for specific tasks like FER. To preserve the generalization ability of CLIP and the high precision of the FER model, we design a novel approach that learns sigmoid masks based on the fixed CLIP face features to extract expression features. To further improve the generalization ability on unseen test sets, we separate the channels of the learned masked features according to the expression classes to directly generate logits and avoid using the FC layer to reduce overfitting. We also introduce a channel-diverse loss to make the learned masks separated. Extensive experiments on five different FER datasets verify that our method outperforms SOTA FER methods by large margins. Code is available in https://github.com/zyh-uaiaaaa/Generalizable-FER.

Related papers

Unlocking the Hidden Potential of CLIP in Generalizable Deepfake Detection [23.48106270102081]
This paper tackles the challenge of detecting partially manipulated facial deepfakes. We leverage the Contrastive Language-Image Pre-training (CLIP) model, specifically its ViT-L/14 visual encoder. The proposed approach utilizes parameter-efficient fine-tuning (PEFT) techniques, such as LN-tuning, to adjust a small subset of the model's parameters.
arXiv Detail & Related papers (2025-03-25T14:10:54Z)
Effort: Efficient Orthogonal Modeling for Generalizable AI-Generated Image Detection [66.16595174895802]
Existing AI-generated image (AIGI) detection methods often suffer from limited generalization performance. In this paper, we identify a crucial yet previously overlooked asymmetry phenomenon in AIGI detection.
arXiv Detail & Related papers (2024-11-23T19:10:32Z)
Stacking Brick by Brick: Aligned Feature Isolation for Incremental Face Forgery Detection [18.46382766430443]
A naively trained IFFD model is prone to catastrophic forgetting when new forgeries are integrated. We propose a Latent-space Incremental Detector (LID) that leverages SUR data to isolate and align distributions. For evaluation, we construct a more advanced and comprehensive benchmark tailored for IFFD.
arXiv Detail & Related papers (2024-11-18T09:18:36Z)
Bridge the Points: Graph-based Few-shot Segment Anything Semantically [79.1519244940518]
Recent advancements in pre-training techniques have enhanced the capabilities of vision foundation models. Recent studies extend the SAM to Few-shot Semantic segmentation (FSS) We propose a simple yet effective approach based on graph analysis.
arXiv Detail & Related papers (2024-10-09T15:02:28Z)
Spatial Action Unit Cues for Interpretable Deep Facial Expression Recognition [55.97779732051921]
State-of-the-art classifiers for facial expression recognition (FER) lack interpretability, an important feature for end-users. A new learning strategy is proposed to explicitly incorporate AU cues into classifier training, allowing to train deep interpretable models. Our new strategy is generic, and can be applied to any deep CNN- or transformer-based classifier without requiring any architectural change or significant additional training time.
arXiv Detail & Related papers (2024-10-01T10:42:55Z)
MFCLIP: Multi-modal Fine-grained CLIP for Generalizable Diffusion Face Forgery Detection [64.29452783056253]
The rapid development of photo-realistic face generation methods has raised significant concerns in society and academia. Although existing approaches mainly capture face forgery patterns using image modality, other modalities like fine-grained noises and texts are not fully explored. We propose a novel multi-modal fine-grained CLIP (MFCLIP) model, which mines comprehensive and fine-grained forgery traces across image-noise modalities.
arXiv Detail & Related papers (2024-09-15T13:08:59Z)
MoE-FFD: Mixture of Experts for Generalized and Parameter-Efficient Face Forgery Detection [54.545054873239295]
Deepfakes have recently raised significant trust issues and security concerns among the public. ViT-based methods take advantage of the expressivity of transformers, achieving superior detection performance. This work introduces Mixture-of-Experts modules for Face Forgery Detection (MoE-FFD), a generalized yet parameter-efficient ViT-based approach.
arXiv Detail & Related papers (2024-04-12T13:02:08Z)
Latent-OFER: Detect, Mask, and Reconstruct with Latent Vectors for Occluded Facial Expression Recognition [0.0]
The proposed method can detect occluded parts of the face as if they were unoccluded, and recognize them, improving FER accuracy. It involves three steps: First, the vision transformer (ViT)-based occlusion patch detector masks the occluded position by training only latent vectors from the unoccluded patches. Second, the hybrid reconstruction network generates the masking position as a complete image using the ViT and convolutional neural network (CNN) Last, the expression-relevant latent vector extractor retrieves and uses expression-related information from all latent vectors by applying a CNN-based class activation map
arXiv Detail & Related papers (2023-07-21T07:56:32Z)
Cluster-level pseudo-labelling for source-free cross-domain facial expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER) Our method exploits self-supervised pretraining to learn good feature representations from the target data. We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z)
Loss Function Entropy Regularization for Diverse Decision Boundaries [0.0]
Loss Function Entropy Regularization (LFER), are regularization terms to be added upon the pre-training and contrastive learning objective functions. We show that LFER can produce an ensemble where each have accuracy comparable to the state-of-the-art, yet have varied latent decision boundaries.
arXiv Detail & Related papers (2022-04-30T10:16:41Z)
Face Presentation Attack Detection using Taskonomy Feature [26.343512092423985]
Presentation Attack Detection (PAD) methods are critical to ensure the security of Face Recognition Systems (FRSs) Existing PAD methods are highly dependent on the limited training set and cannot generalize well to unknown PAs. We propose to apply taskonomy (task taxonomy) from other face-related tasks to solve face PAD.
arXiv Detail & Related papers (2021-11-22T08:35:26Z)
BioMetricNet: deep unconstrained face verification through learning of metrics regularized onto Gaussian distributions [25.00475462213752]
We present BioMetricNet, a novel framework for deep unconstrained face verification. The proposed approach does not impose any specific metric on facial features. It shapes the decision space by learning a latent representation in which matching and non-matching pairs are mapped onto clearly separated and well-behaved target distributions.
arXiv Detail & Related papers (2020-08-13T17:22:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.