Related papers: Enhancing Facial Expression Recognition through Dual-Direction Attention Mixed Feature Networks: Application to 7th ABAW Challenge

Enhancing Facial Expression Recognition through Dual-Direction Attention Mixed Feature Networks: Application to 7th ABAW Challenge

URL: http://arxiv.org/abs/2407.12390v3
Date: Thu, 5 Sep 2024 11:35:21 GMT
Title: Enhancing Facial Expression Recognition through Dual-Direction Attention Mixed Feature Networks: Application to 7th ABAW Challenge
Authors: Josep Cabacas-Maso, Elena Ortega-Beltrán, Ismael Benito-Altamirano, Carles Ventura,
Abstract summary: We present our contribution to the 7th ABAW challenge at ECCV 2024. By utilizing a Dual-Direction Attention Mixed Feature Network (DDAMFN) for multitask facial expression recognition, we achieve results far beyond the proposed baseline for the Multi-Task ABAW challenge.
Score: 1.0374615809135401
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We present our contribution to the 7th ABAW challenge at ECCV 2024, by utilizing a Dual-Direction Attention Mixed Feature Network (DDAMFN) for multitask facial expression recognition, we achieve results far beyond the proposed baseline for the Multi-Task ABAW challenge. Our proposal uses the well-known DDAMFN architecture as base to effectively predict valence-arousal, emotion recognition, and facial action units. We demonstrate the architecture ability to handle these tasks simultaneously, providing insights into its architecture and the rationale behind its design. Additionally, we compare our results for a multitask solution with independent single-task performance.

Related papers

VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning [55.34552054232695]
We introduce VisionReasoner, a unified framework capable of reasoning and solving multiple visual perception tasks.<n>We evaluate VisionReasoner on ten diverse tasks spanning three critical domains: detection, segmentation, and counting.
arXiv Detail & Related papers (2025-05-17T16:51:47Z)
Enhancing Facial Expression Recognition through Dual-Direction Attention Mixed Feature Networks and CLIP: Application to 8th ABAW Challenge [1.0374615809135401]
We present our contribution to the 8th ABAW challenge at CVPR 2025. We tackle valence-arousal estimation, emotion recognition, and facial action unit detection as three independent challenges. Our approach leverages the well-known Dual-Direction Attention Mixed Feature Network (DDAMFN) for all three tasks, achieving results that surpass the proposed baselines.
arXiv Detail & Related papers (2025-03-15T21:03:03Z)
Affective Behaviour Analysis via Progressive Learning [23.455163723584427]
We present our methods and experimental results for the two competition tracks. We train a Masked-Auto in a self-supervised manner to attain high-quality facial features. We utilize curriculum learning to transition the model from recognizing single expressions to recognizing compound expressions.
arXiv Detail & Related papers (2024-07-24T02:24:21Z)
HSEmotion Team at the 7th ABAW Challenge: Multi-Task Learning and Compound Facial Expression Recognition [16.860963320038902]
We describe the results of the HSEmotion team in two tasks of the seventh Affective Behavior Analysis in-the-wild (ABAW) competition. We propose an efficient pipeline based on frame-level facial feature extractors pre-trained in multi-task settings. We ensure the privacy-awareness of our techniques by using the lightweight architectures of neural networks.
arXiv Detail & Related papers (2024-07-18T05:47:49Z)
Affective Behavior Analysis using Task-adaptive and AU-assisted Graph Network [18.304164382834617]
We present our solution and experiment result for the Multi-Task Learning Challenge of the 7th Affective Behavior Analysis in-the-wild(ABAW7) Competition. This challenge consists of three tasks: action unit detection, facial expression recognition, and valance-arousal estimation.
arXiv Detail & Related papers (2024-07-16T12:33:22Z)
Intuition-aware Mixture-of-Rank-1-Experts for Parameter Efficient Finetuning [50.73666458313015]
Large Language Models (LLMs) have demonstrated significant potential in performing multiple tasks in multimedia applications. MoE has been emerged as a promising solution with its sparse architecture for effective task decoupling. Intuition-MoR1E achieves superior efficiency and 2.15% overall accuracy improvement across 14 public datasets.
arXiv Detail & Related papers (2024-04-13T12:14:58Z)
Facial Affective Behavior Analysis with Instruction Tuning [58.332959295770614]
Facial affective behavior analysis (FABA) is crucial for understanding human mental states from images. Traditional approaches primarily deploy models to discriminate among discrete emotion categories, and lack the fine granularity and reasoning capability for complex facial behaviors. We introduce an instruction-following dataset for two FABA tasks, emotion and action unit recognition, and a benchmark FABA-Bench with a new metric considering both recognition and generation ability. We also introduce a facial prior expert module with face structure knowledge and a low-rank adaptation module into pre-trained MLLM.
arXiv Detail & Related papers (2024-04-07T19:23:28Z)
Two-Aspect Information Fusion Model For ABAW4 Multi-task Challenge [41.32053075381269]
The task of ABAW is to predict frame-level emotion descriptors from videos. We propose a novel end to end architecture to achieve full integration of different types of information.
arXiv Detail & Related papers (2022-07-23T01:48:51Z)
Dual-AI: Dual-path Actor Interaction Learning for Group Activity Recognition [103.62363658053557]
We propose a Dual-path Actor Interaction (DualAI) framework, which flexibly arranges spatial and temporal transformers. We also introduce a novel Multi-scale Actor Contrastive Loss (MAC-Loss) between two interactive paths of Dual-AI. Our Dual-AI can boost group activity recognition by fusing distinct discriminative features of different actors.
arXiv Detail & Related papers (2022-04-05T12:17:40Z)
On Exploring Pose Estimation as an Auxiliary Learning Task for Visible-Infrared Person Re-identification [66.58450185833479]
In this paper, we exploit Pose Estimation as an auxiliary learning task to assist the VI-ReID task in an end-to-end framework. By jointly training these two tasks in a mutually beneficial manner, our model learns higher quality modality-shared and ID-related features. Experimental results on two benchmark VI-ReID datasets show that the proposed method consistently improves state-of-the-art methods by significant margins.
arXiv Detail & Related papers (2022-01-11T09:44:00Z)
MEmoBERT: Pre-training Model with Prompt-based Learning for Multimodal Emotion Recognition [118.73025093045652]
We propose a pre-training model textbfMEmoBERT for multimodal emotion recognition. Unlike the conventional "pre-train, finetune" paradigm, we propose a prompt-based method that reformulates the downstream emotion classification task as a masked text prediction. Our proposed MEmoBERT significantly enhances emotion recognition performance.
arXiv Detail & Related papers (2021-10-27T09:57:00Z)
Prior Aided Streaming Network for Multi-task Affective Recognitionat the 2nd ABAW2 Competition [9.188777864190204]
We introduce our submission to the 2nd Affective Behavior Analysis in-the-wild (ABAW2) Competition. In dealing with different emotion representations, we propose a multi-task streaming network. We leverage an advanced facial expression embedding as prior knowledge.
arXiv Detail & Related papers (2021-07-08T09:35:08Z)
A Multi-resolution Approach to Expression Recognition in the Wild [9.118706387430883]
We propose a multi-resolution approach to solve the Facial Expression Recognition task. We ground our intuition on the observation that often faces images are acquired at different resolutions. To our aim, we use a ResNet-like architecture, equipped with Squeeze-and-Excitation blocks, trained on the Affect-in-the-Wild 2 dataset.
arXiv Detail & Related papers (2021-03-09T21:21:02Z)
Reparameterizing Convolutions for Incremental Multi-Task Learning without Task Interference [75.95287293847697]
Two common challenges in developing multi-task models are often overlooked in literature. First, enabling the model to be inherently incremental, continuously incorporating information from new tasks without forgetting the previously learned ones (incremental learning) Second, eliminating adverse interactions amongst tasks, which has been shown to significantly degrade the single-task performance in a multi-task setup (task interference)
arXiv Detail & Related papers (2020-07-24T14:44:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.