Related papers: SimFLE: Simple Facial Landmark Encoding for Self-Supervised Facial Expression Recognition in the Wild

SimFLE: Simple Facial Landmark Encoding for Self-Supervised Facial Expression Recognition in the Wild

URL: http://arxiv.org/abs/2303.07648v1
Date: Tue, 14 Mar 2023 06:30:55 GMT
Title: SimFLE: Simple Facial Landmark Encoding for Self-Supervised Facial Expression Recognition in the Wild
Authors: Jiyong Moon and Seongsik Park
Abstract summary: We propose a self-supervised simple facial landmark encoding (SimFLE) method that can learn effective encoding of facial landmarks. We introduce novel FaceMAE module for this purpose. Experimental results on several FER-W benchmarks prove that the proposed SimFLE is superior in facial landmark localization.
Score: 3.4798852684389963
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: One of the key issues in facial expression recognition in the wild (FER-W) is that curating large-scale labeled facial images is challenging due to the inherent complexity and ambiguity of facial images. Therefore, in this paper, we propose a self-supervised simple facial landmark encoding (SimFLE) method that can learn effective encoding of facial landmarks, which are important features for improving the performance of FER-W, without expensive labels. Specifically, we introduce novel FaceMAE module for this purpose. FaceMAE reconstructs masked facial images with elaborately designed semantic masking. Unlike previous random masking, semantic masking is conducted based on channel information processed in the backbone, so rich semantics of channels can be explored. Additionally, the semantic masking process is fully trainable, enabling FaceMAE to guide the backbone to learn spatial details and contextual properties of fine-grained facial landmarks. Experimental results on several FER-W benchmarks prove that the proposed SimFLE is superior in facial landmark localization and noticeably improved performance compared to the supervised baseline and other self-supervised methods.

Related papers

FaceInsight: A Multimodal Large Language Model for Face Perception [69.06084304620026]
We propose FaceInsight, a versatile face perception large language model (MLLM) that provides fine-grained facial information. Our approach introduces visual-textual alignment of facial knowledge to model both uncertain dependencies and deterministic relationships among facial information. Comprehensive experiments and analyses across three face perception tasks demonstrate that FaceInsight consistently outperforms nine compared MLLMs.
arXiv Detail & Related papers (2025-04-22T06:31:57Z)
LAFS: Landmark-based Facial Self-supervised Learning for Face Recognition [37.4550614524874]
We focus on learning facial representations that can be adapted to train effective face recognition models. We explore the learning strategy of unlabeled facial images through self-supervised pretraining. Our method achieves significant improvement over the state-of-the-art on multiple face recognition benchmarks.
arXiv Detail & Related papers (2024-03-13T01:07:55Z)
Self-Supervised Facial Representation Learning with Facial Region Awareness [13.06996608324306]
Self-supervised pre-training has been proven to be effective in learning transferable representations that benefit various visual tasks. Recent efforts toward this goal are limited to treating each face image as a whole. We propose a novel self-supervised facial representation learning framework to learn consistent global and local facial representations.
arXiv Detail & Related papers (2024-03-04T15:48:56Z)
A Generalist FaceX via Learning Unified Facial Representation [77.74407008931486]
FaceX is a novel facial generalist model capable of handling diverse facial tasks simultaneously. Our versatile FaceX achieves competitive performance compared to elaborate task-specific models on popular facial editing tasks.
arXiv Detail & Related papers (2023-12-31T17:41:48Z)
Seeing through the Mask: Multi-task Generative Mask Decoupling Face Recognition [47.248075664420874]
Current general face recognition system suffers from serious performance degradation when encountering occluded scenes. This paper proposes a Multi-task gEnerative mask dEcoupling face Recognition (MEER) network to jointly handle these two tasks. We first present a novel mask decoupling module to disentangle mask and identity information, which makes the network obtain purer identity features from visible facial components.
arXiv Detail & Related papers (2023-11-20T03:23:03Z)
FaceDancer: Pose- and Occlusion-Aware High Fidelity Face Swapping [62.38898610210771]
We present a new single-stage method for subject face swapping and identity transfer, named FaceDancer. We have two major contributions: Adaptive Feature Fusion Attention (AFFA) and Interpreted Feature Similarity Regularization (IFSR)
arXiv Detail & Related papers (2022-10-19T11:31:38Z)
Emotion Separation and Recognition from a Facial Expression by Generating the Poker Face with Vision Transformers [57.1091606948826]
We propose a novel FER model, named Poker Face Vision Transformer or PF-ViT, to address these challenges. PF-ViT aims to separate and recognize the disturbance-agnostic emotion from a static facial image via generating its corresponding poker face. PF-ViT utilizes vanilla Vision Transformers, and its components are pre-trained as Masked Autoencoders on a large facial expression dataset.
arXiv Detail & Related papers (2022-07-22T13:39:06Z)
Foreground-guided Facial Inpainting with Fidelity Preservation [7.5089719291325325]
We propose a foreground-guided facial inpainting framework that can extract and generate facial features using convolutional neural network layers. Specifically, we propose a new loss function with semantic capability reasoning of facial expressions, natural and unnatural features (make-up) Our proposed method achieved comparable quantitative results when compare to the state of the art but qualitatively, it demonstrated high-fidelity preservation of facial components.
arXiv Detail & Related papers (2021-05-07T15:50:58Z)
DotFAN: A Domain-transferred Face Augmentation Network for Pose and Illumination Invariant Face Recognition [94.96686189033869]
We propose a 3D model-assisted domain-transferred face augmentation network (DotFAN) DotFAN can generate a series of variants of an input face based on the knowledge distilled from existing rich face datasets collected from other domains. Experiments show that DotFAN is beneficial for augmenting small face datasets to improve their within-class diversity.
arXiv Detail & Related papers (2020-02-23T08:16:34Z)
Joint Deep Learning of Facial Expression Synthesis and Recognition [97.19528464266824]
We propose a novel joint deep learning of facial expression synthesis and recognition method for effective FER. The proposed method involves a two-stage learning procedure. Firstly, a facial expression synthesis generative adversarial network (FESGAN) is pre-trained to generate facial images with different facial expressions. In order to alleviate the problem of data bias between the real images and the synthetic images, we propose an intra-class loss with a novel real data-guided back-propagation (RDBP) algorithm.
arXiv Detail & Related papers (2020-02-06T10:56:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.