Related papers: HSEmotion Team at the 6th ABAW Competition: Facial Expressions, Valence-Arousal and Emotion Intensity Prediction

HSEmotion Team at the 6th ABAW Competition: Facial Expressions, Valence-Arousal and Emotion Intensity Prediction

URL: http://arxiv.org/abs/2403.11590v1
Date: Mon, 18 Mar 2024 09:08:41 GMT
Title: HSEmotion Team at the 6th ABAW Competition: Facial Expressions, Valence-Arousal and Emotion Intensity Prediction
Authors: Andrey V. Savchenko,
Abstract summary: We study the possibility of using pre-trained deep models that extract reliable emotional features without the need to fine-tune the neural networks for a downstream task. We introduce several lightweight models based on MobileViT, MobileFaceNet, EfficientNet, and DFNDAM architectures trained in multi-task scenarios to recognize facial expressions. Our approach lets us significantly improve quality metrics on validation sets compared to existing non-ensemble techniques.
Score: 16.860963320038902
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This article presents our results for the sixth Affective Behavior Analysis in-the-wild (ABAW) competition. To improve the trustworthiness of facial analysis, we study the possibility of using pre-trained deep models that extract reliable emotional features without the need to fine-tune the neural networks for a downstream task. In particular, we introduce several lightweight models based on MobileViT, MobileFaceNet, EfficientNet, and DDAMFN architectures trained in multi-task scenarios to recognize facial expressions, valence, and arousal on static photos. These neural networks extract frame-level features fed into a simple classifier, e.g., linear feed-forward neural network, to predict emotion intensity, compound expressions, action units, facial expressions, and valence/arousal. Experimental results for five tasks from the sixth ABAW challenge demonstrate that our approach lets us significantly improve quality metrics on validation sets compared to existing non-ensemble techniques.

Related papers

HSEmotion Team at ABAW-8 Competition: Audiovisual Ambivalence/Hesitancy, Emotional Mimicry Intensity and Facial Expression Recognition [16.860963320038902]
This article presents our results for the eighth Affective Behavior Analysis in-the-Wild (ABAW) competition. We combine facial emotional descriptors extracted by pre-trained models with acoustic features and embeddings of texts recognized from speech. The video-level prediction of emotional mimicry intensity is implemented by simply aggregating frame-level features and training a multi-layered perceptron.
arXiv Detail & Related papers (2025-03-13T14:21:46Z)
HSEmotion Team at the 7th ABAW Challenge: Multi-Task Learning and Compound Facial Expression Recognition [16.860963320038902]
We describe the results of the HSEmotion team in two tasks of the seventh Affective Behavior Analysis in-the-wild (ABAW) competition. We propose an efficient pipeline based on frame-level facial feature extractors pre-trained in multi-task settings. We ensure the privacy-awareness of our techniques by using the lightweight architectures of neural networks.
arXiv Detail & Related papers (2024-07-18T05:47:49Z)
Alleviating Catastrophic Forgetting in Facial Expression Recognition with Emotion-Centered Models [49.3179290313959]
The proposed method, emotion-centered generative replay (ECgr), tackles this challenge by integrating synthetic images from generative adversarial networks. ECgr incorporates a quality assurance algorithm to ensure the fidelity of generated images. The experimental results on four diverse facial expression datasets demonstrate that incorporating images generated by our pseudo-rehearsal method enhances training on the targeted dataset and the source dataset.
arXiv Detail & Related papers (2024-04-18T15:28:34Z)
Computer Vision Estimation of Emotion Reaction Intensity in the Wild [1.5481864635049696]
We describe our submission to the newly introduced Emotional Reaction Intensity (ERI) Estimation challenge. We developed four deep neural networks trained in the visual domain and a multimodal model trained with both visual and audio features to predict emotion reaction intensity.
arXiv Detail & Related papers (2023-03-19T19:09:41Z)
CIAO! A Contrastive Adaptation Mechanism for Non-Universal Facial Expression Recognition [80.07590100872548]
We propose Contrastive Inhibitory Adaptati On (CIAO), a mechanism that adapts the last layer of facial encoders to depict specific affective characteristics on different datasets. CIAO presents an improvement in facial expression recognition performance over six different datasets with very unique affective representations.
arXiv Detail & Related papers (2022-08-10T15:46:05Z)
HSE-NN Team at the 4th ABAW Competition: Multi-task Emotion Recognition and Learning from Synthetic Images [7.056222499095849]
We present the results of the HSE-NN team in the 4th competition on Affective Behavior Analysis in-the-wild (ABAW) The novel multi-task EfficientNet model is trained for simultaneous recognition of facial expressions. The resulting MT-EmotiEffNet extracts visual features that are fed into simple feed-forward neural networks.
arXiv Detail & Related papers (2022-07-19T18:43:14Z)
Frame-level Prediction of Facial Expressions, Valence, Arousal and Action Units for Mobile Devices [7.056222499095849]
We propose the novel frame-level emotion recognition algorithm by extracting facial features with the single EfficientNet model pre-trained on AffectNet. Our approach may be implemented even for video analytics on mobile devices.
arXiv Detail & Related papers (2022-03-25T03:53:27Z)
A Multi-resolution Approach to Expression Recognition in the Wild [9.118706387430883]
We propose a multi-resolution approach to solve the Facial Expression Recognition task. We ground our intuition on the observation that often faces images are acquired at different resolutions. To our aim, we use a ResNet-like architecture, equipped with Squeeze-and-Excitation blocks, trained on the Affect-in-the-Wild 2 dataset.
arXiv Detail & Related papers (2021-03-09T21:21:02Z)
Continuous Emotion Recognition with Spatiotemporal Convolutional Neural Networks [82.54695985117783]
We investigate the suitability of state-of-the-art deep learning architectures for continuous emotion recognition using long video sequences captured in-the-wild. We have developed and evaluated convolutional recurrent neural networks combining 2D-CNNs and long short term-memory units, and inflated 3D-CNN models, which are built by inflating the weights of a pre-trained 2D-CNN model during fine-tuning.
arXiv Detail & Related papers (2020-11-18T13:42:05Z)
The FaceChannel: A Fast & Furious Deep Neural Network for Facial Expression Recognition [71.24825724518847]
Current state-of-the-art models for automatic Facial Expression Recognition (FER) are based on very deep neural networks that are effective but rather expensive to train. We formalize the FaceChannel, a light-weight neural network that has much fewer parameters than common deep neural networks. We demonstrate how our model achieves a comparable, if not better, performance to the current state-of-the-art in FER.
arXiv Detail & Related papers (2020-09-15T09:25:37Z)
The FaceChannel: A Light-weight Deep Neural Network for Facial Expression Recognition [71.24825724518847]
Current state-of-the-art models for automatic FER are based on very deep neural networks that are difficult to train. We formalize the FaceChannel, a light-weight neural network that has much fewer parameters than common deep neural networks. We demonstrate how the FaceChannel achieves a comparable, if not better, performance, as compared to the current state-of-the-art in FER.
arXiv Detail & Related papers (2020-04-17T12:03:14Z)
Learning to Augment Expressions for Few-shot Fine-grained Facial Expression Recognition [98.83578105374535]
We present a novel Fine-grained Facial Expression Database - F2ED. It includes more than 200k images with 54 facial expressions from 119 persons. Considering the phenomenon of uneven data distribution and lack of samples is common in real-world scenarios, we evaluate several tasks of few-shot expression learning. We propose a unified task-driven framework - Compositional Generative Adversarial Network (Comp-GAN) learning to synthesize facial images.
arXiv Detail & Related papers (2020-01-17T03:26:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.