Related papers: Frame-level Prediction of Facial Expressions, Valence, Arousal and Action Units for Mobile Devices

Frame-level Prediction of Facial Expressions, Valence, Arousal and Action Units for Mobile Devices

URL: http://arxiv.org/abs/2203.13436v1
Date: Fri, 25 Mar 2022 03:53:27 GMT
Title: Frame-level Prediction of Facial Expressions, Valence, Arousal and Action Units for Mobile Devices
Authors: Andrey V. Savchenko
Abstract summary: We propose the novel frame-level emotion recognition algorithm by extracting facial features with the single EfficientNet model pre-trained on AffectNet. Our approach may be implemented even for video analytics on mobile devices.
Score: 7.056222499095849
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we consider the problem of real-time video-based facial emotion analytics, namely, facial expression recognition, prediction of valence and arousal and detection of action unit points. We propose the novel frame-level emotion recognition algorithm by extracting facial features with the single EfficientNet model pre-trained on AffectNet. As a result, our approach may be implemented even for video analytics on mobile devices. Experimental results for the large scale Aff-Wild2 database from the third Affective Behavior Analysis in-the-wild (ABAW) Competition demonstrate that our simple model is significantly better when compared to the VggFace baseline. In particular, our method is characterized by 0.15-0.2 higher performance measures for validation sets in uni-task Expression Classification, Valence-Arousal Estimation and Expression Classification. Due to simplicity, our approach may be considered as a new baseline for all four sub-challenges.

Related papers

HSEmotion Team at ABAW-8 Competition: Audiovisual Ambivalence/Hesitancy, Emotional Mimicry Intensity and Facial Expression Recognition [16.860963320038902]
This article presents our results for the eighth Affective Behavior Analysis in-the-Wild (ABAW) competition. We combine facial emotional descriptors extracted by pre-trained models with acoustic features and embeddings of texts recognized from speech. The video-level prediction of emotional mimicry intensity is implemented by simply aggregating frame-level features and training a multi-layered perceptron.
arXiv Detail & Related papers (2025-03-13T14:21:46Z)
UniForensics: Face Forgery Detection via General Facial Representation [60.5421627990707]
High-level semantic features are less susceptible to perturbations and not limited to forgery-specific artifacts, thus having stronger generalization. We introduce UniForensics, a novel deepfake detection framework that leverages a transformer-based video network, with a meta-functional face classification for enriched facial representation.
arXiv Detail & Related papers (2024-07-26T20:51:54Z)
HSEmotion Team at the 7th ABAW Challenge: Multi-Task Learning and Compound Facial Expression Recognition [16.860963320038902]
We describe the results of the HSEmotion team in two tasks of the seventh Affective Behavior Analysis in-the-wild (ABAW) competition. We propose an efficient pipeline based on frame-level facial feature extractors pre-trained in multi-task settings. We ensure the privacy-awareness of our techniques by using the lightweight architectures of neural networks.
arXiv Detail & Related papers (2024-07-18T05:47:49Z)
HSEmotion Team at the 6th ABAW Competition: Facial Expressions, Valence-Arousal and Emotion Intensity Prediction [16.860963320038902]
We study the possibility of using pre-trained deep models that extract reliable emotional features without the need to fine-tune the neural networks for a downstream task. We introduce several lightweight models based on MobileViT, MobileFaceNet, EfficientNet, and DFNDAM architectures trained in multi-task scenarios to recognize facial expressions. Our approach lets us significantly improve quality metrics on validation sets compared to existing non-ensemble techniques.
arXiv Detail & Related papers (2024-03-18T09:08:41Z)
Robust Light-Weight Facial Affective Behavior Recognition with CLIP [12.368133562194267]
Human affective behavior analysis aims to delve into human expressions and behaviors to deepen our understanding of human emotions. Existing approaches in expression classification and AU detection often necessitate complex models and substantial computational resources. We introduce the first lightweight framework adept at efficiently tackling both expression classification and AU detection.
arXiv Detail & Related papers (2024-03-14T23:21:55Z)
SwinFace: A Multi-task Transformer for Face Recognition, Expression Recognition, Age Estimation and Attribute Estimation [60.94239810407917]
This paper presents a multi-purpose algorithm for simultaneous face recognition, facial expression recognition, age estimation, and face attribute estimation based on a single Swin Transformer. To address the conflicts among multiple tasks, a Multi-Level Channel Attention (MLCA) module is integrated into each task-specific analysis. Experiments show that the proposed model has a better understanding of the face and achieves excellent performance for all tasks.
arXiv Detail & Related papers (2023-08-22T15:38:39Z)
Cluster-level pseudo-labelling for source-free cross-domain facial expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER) Our method exploits self-supervised pretraining to learn good feature representations from the target data. We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z)
Progressive Spatio-Temporal Bilinear Network with Monte Carlo Dropout for Landmark-based Facial Expression Recognition with Uncertainty Estimation [93.73198973454944]
The performance of our method is evaluated on three widely used datasets. It is comparable to that of video-based state-of-the-art methods while it has much less complexity.
arXiv Detail & Related papers (2021-06-08T13:40:30Z)
A Multi-resolution Approach to Expression Recognition in the Wild [9.118706387430883]
We propose a multi-resolution approach to solve the Facial Expression Recognition task. We ground our intuition on the observation that often faces images are acquired at different resolutions. To our aim, we use a ResNet-like architecture, equipped with Squeeze-and-Excitation blocks, trained on the Affect-in-the-Wild 2 dataset.
arXiv Detail & Related papers (2021-03-09T21:21:02Z)
Expression Recognition Analysis in the Wild [9.878384185493623]
We report details and experimental results about a facial expression recognition method based on state-of-the-art methods. We fine-tuned a SeNet deep learning architecture pre-trained on the well-known VGGFace2 dataset. This paper is also required by the Affective Behavior Analysis in-the-wild (ABAW) competition in order to evaluate on the test set this approach.
arXiv Detail & Related papers (2021-01-22T17:28:31Z)
The FaceChannel: A Fast & Furious Deep Neural Network for Facial Expression Recognition [71.24825724518847]
Current state-of-the-art models for automatic Facial Expression Recognition (FER) are based on very deep neural networks that are effective but rather expensive to train. We formalize the FaceChannel, a light-weight neural network that has much fewer parameters than common deep neural networks. We demonstrate how our model achieves a comparable, if not better, performance to the current state-of-the-art in FER.
arXiv Detail & Related papers (2020-09-15T09:25:37Z)
Learning to Augment Expressions for Few-shot Fine-grained Facial Expression Recognition [98.83578105374535]
We present a novel Fine-grained Facial Expression Database - F2ED. It includes more than 200k images with 54 facial expressions from 119 persons. Considering the phenomenon of uneven data distribution and lack of samples is common in real-world scenarios, we evaluate several tasks of few-shot expression learning. We propose a unified task-driven framework - Compositional Generative Adversarial Network (Comp-GAN) learning to synthesize facial images.
arXiv Detail & Related papers (2020-01-17T03:26:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.