Multi-modal Facial Action Unit Detection with Large Pre-trained Models
for the 5th Competition on Affective Behavior Analysis in-the-wild
- URL: http://arxiv.org/abs/2303.10590v3
- Date: Mon, 17 Apr 2023 20:17:55 GMT
- Title: Multi-modal Facial Action Unit Detection with Large Pre-trained Models
for the 5th Competition on Affective Behavior Analysis in-the-wild
- Authors: Yufeng Yin, Minh Tran, Di Chang, Xinrui Wang, Mohammad Soleymani
- Abstract summary: This paper presents our submission to the Affective Behavior Analysis in-the-wild (ABAW) 2023 Competition for AU detection.
We propose a multi-modal method for facial action unit detection with visual, acoustic, and lexical features extracted from the large pre-trained models.
Our approach achieves the F1 score of 52.3% on the official validation set of the 5th ABAW Challenge.
- Score: 7.905280782507726
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Facial action unit detection has emerged as an important task within facial
expression analysis, aimed at detecting specific pre-defined, objective facial
expressions, such as lip tightening and cheek raising. This paper presents our
submission to the Affective Behavior Analysis in-the-wild (ABAW) 2023
Competition for AU detection. We propose a multi-modal method for facial action
unit detection with visual, acoustic, and lexical features extracted from the
large pre-trained models. To provide high-quality details for visual feature
extraction, we apply super-resolution and face alignment to the training data
and show potential performance gain. Our approach achieves the F1 score of
52.3% on the official validation set of the 5th ABAW Challenge.
Related papers
- VEDIT: Latent Prediction Architecture For Procedural Video Representation Learning [59.68917139718813]
We show that a strong off-the-shelf frozen pretrained visual encoder can achieve state-of-the-art (SoTA) performance in forecasting and procedural planning.
By conditioning on frozen clip-level embeddings from observed steps to predict the actions of unseen steps, our prediction model is able to learn robust representations for forecasting.
arXiv Detail & Related papers (2024-10-04T14:52:09Z) - UniForensics: Face Forgery Detection via General Facial Representation [60.5421627990707]
High-level semantic features are less susceptible to perturbations and not limited to forgery-specific artifacts, thus having stronger generalization.
We introduce UniForensics, a novel deepfake detection framework that leverages a transformer-based video network, with a meta-functional face classification for enriched facial representation.
arXiv Detail & Related papers (2024-07-26T20:51:54Z) - Task-adaptive Q-Face [75.15668556061772]
We propose a novel task-adaptive multi-task face analysis method named as Q-Face.
Q-Face simultaneously performs multiple face analysis tasks with a unified model.
Our method achieves state-of-the-art performance on face expression recognition, action unit detection, face attribute analysis, age estimation, and face pose estimation.
arXiv Detail & Related papers (2024-05-15T03:13:11Z) - Emotic Masked Autoencoder with Attention Fusion for Facial Expression Recognition [1.4374467687356276]
This paper presents an innovative approach integrating the MAE-Face self-supervised learning (SSL) method and multi-view Fusion Attention mechanism for expression classification.
We suggest easy-to-implement and no-training frameworks aimed at highlighting key facial features to determine if such features can serve as guides for the model.
The efficacy of this method is validated by improvements in model performance on the Aff-wild2 dataset.
arXiv Detail & Related papers (2024-03-19T16:21:47Z) - HSEmotion Team at the 6th ABAW Competition: Facial Expressions, Valence-Arousal and Emotion Intensity Prediction [16.860963320038902]
We study the possibility of using pre-trained deep models that extract reliable emotional features without the need to fine-tune the neural networks for a downstream task.
We introduce several lightweight models based on MobileViT, MobileFaceNet, EfficientNet, and DFNDAM architectures trained in multi-task scenarios to recognize facial expressions.
Our approach lets us significantly improve quality metrics on validation sets compared to existing non-ensemble techniques.
arXiv Detail & Related papers (2024-03-18T09:08:41Z) - Multi-modal Facial Affective Analysis based on Masked Autoencoder [7.17338843593134]
We introduce our submission to the CVPR 2023: ABAW5 competition: Affective Behavior Analysis in-the-wild.
Our approach involves several key components. First, we utilize the visual information from a Masked Autoencoder(MAE) model that has been pre-trained on a large-scale face image dataset in a self-supervised manner.
Our approach achieves impressive results in the ABAW5 competition, with an average F1 score of 55.49% and 41.21% in the AU and EXPR tracks, respectively.
arXiv Detail & Related papers (2023-03-20T03:58:03Z) - CIAO! A Contrastive Adaptation Mechanism for Non-Universal Facial
Expression Recognition [80.07590100872548]
We propose Contrastive Inhibitory Adaptati On (CIAO), a mechanism that adapts the last layer of facial encoders to depict specific affective characteristics on different datasets.
CIAO presents an improvement in facial expression recognition performance over six different datasets with very unique affective representations.
arXiv Detail & Related papers (2022-08-10T15:46:05Z) - Frame-level Prediction of Facial Expressions, Valence, Arousal and
Action Units for Mobile Devices [7.056222499095849]
We propose the novel frame-level emotion recognition algorithm by extracting facial features with the single EfficientNet model pre-trained on AffectNet.
Our approach may be implemented even for video analytics on mobile devices.
arXiv Detail & Related papers (2022-03-25T03:53:27Z) - Multi-modal Multi-label Facial Action Unit Detection with Transformer [7.30287060715476]
This paper describes our submission to the third Affective Behavior Analysis (ABAW) 2022 competition.
We proposed a transfomer based model to detect facial action unit (FAU) in video.
arXiv Detail & Related papers (2022-03-24T18:59:31Z) - Robust and Precise Facial Landmark Detection by Self-Calibrated Pose
Attention Network [73.56802915291917]
We propose a semi-supervised framework to achieve more robust and precise facial landmark detection.
A Boundary-Aware Landmark Intensity (BALI) field is proposed to model more effective facial shape constraints.
A Self-Calibrated Pose Attention (SCPA) model is designed to provide a self-learned objective function that enforces intermediate supervision.
arXiv Detail & Related papers (2021-12-23T02:51:08Z) - Pre-training strategies and datasets for facial representation learning [58.8289362536262]
We show how to find a universal face representation that can be adapted to several facial analysis tasks and datasets.
We systematically investigate two ways of large-scale representation learning applied to faces: supervised and unsupervised pre-training.
Our main two findings are: Unsupervised pre-training on completely in-the-wild, uncurated data provides consistent and, in some cases, significant accuracy improvements.
arXiv Detail & Related papers (2021-03-30T17:57:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.