Related papers: Design of an Expression Recognition Solution Based on the Global Channel-Spatial Attention Mechanism and Proportional Criterion Fusion

Design of an Expression Recognition Solution Based on the Global Channel-Spatial Attention Mechanism and Proportional Criterion Fusion

URL: http://arxiv.org/abs/2503.11935v3
Date: Fri, 21 Mar 2025 09:31:13 GMT
Title: Design of an Expression Recognition Solution Based on the Global Channel-Spatial Attention Mechanism and Proportional Criterion Fusion
Authors: Jun Yu, Yang Zheng, Lei Wang, Yongqi Wang, Shengfan Xu,
Abstract summary: This paper aims to introduce the method we will adopt in the 8th Affective and Behavioral Analysis in the Wild (ABAW) Competition.<n>Based on the residual hybrid convolutional neural network and the multi-branch convolutional neural network respectively, we design feature extraction models for image and audio sequences.<n>In the facial expression recognition task of the 8th ABAW Competition, our method ranked third on the official validation set.
Score: 11.506800500772734
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Facial expression recognition is a challenging classification task that holds broad application prospects in the field of human-computer interaction. This paper aims to introduce the method we will adopt in the 8th Affective and Behavioral Analysis in the Wild (ABAW) Competition, which will be held during the Conference on Computer Vision and Pattern Recognition (CVPR) in 2025.First of all, we apply the frequency masking technique and the method of extracting data at equal time intervals to conduct targeted processing on the original videos. Then, based on the residual hybrid convolutional neural network and the multi-branch convolutional neural network respectively, we design feature extraction models for image and audio sequences. In particular, we propose a global channel-spatial attention mechanism to enhance the features initially extracted from both the audio and image modalities respectively.Finally, we adopt a decision fusion strategy based on the proportional criterion to fuse the classification results of the two single modalities, obtain an emotion probability vector, and output the final emotional classification. We also design a coarse - fine granularity loss function to optimize the performance of the entire network, which effectively improves the accuracy of facial expression recognition.In the facial expression recognition task of the 8th ABAW Competition, our method ranked third on the official validation set. This result fully confirms the effectiveness and competitiveness of the method we have proposed.

Related papers

A Dual-Branch CNN for Robust Detection of AI-Generated Facial Forgeries [4.313893060699182]
Face forgery techniques pose significant threats to AI security, digital media integrity, and public trust.<n>We propose a novel dual-branch convolutional neural network for face forgery detection.<n>We evaluate our model on the DiFF benchmark, which includes forged images generated from four representative methods.
arXiv Detail & Related papers (2025-10-28T17:06:40Z)
Multi-modal Speech Emotion Recognition via Feature Distribution Adaptation Network [12.200776612016698]
We propose a novel deep inductive transfer learning framework, named feature distribution adaptation network. Our method aims to use deep transfer learning strategies to align visual and audio feature distributions to obtain consistent representation of emotion.
arXiv Detail & Related papers (2024-10-29T13:13:30Z)
A visualization method for data domain changes in CNN networks and the optimization method for selecting thresholds in classification tasks [1.1118946307353794]
Face Anti-Spoofing (FAS) has played a crucial role in preserving the security of face recognition technology. With the rise of counterfeit face generation techniques, the challenge posed by digitally edited faces to face anti-spoofing is escalating. We propose a visualization method that intuitively reflects the training outcomes of models by visualizing the prediction results on datasets.
arXiv Detail & Related papers (2024-04-19T03:12:17Z)
Emotic Masked Autoencoder with Attention Fusion for Facial Expression Recognition [1.4374467687356276]
This paper presents an innovative approach integrating the MAE-Face self-supervised learning (SSL) method and multi-view Fusion Attention mechanism for expression classification. We suggest easy-to-implement and no-training frameworks aimed at highlighting key facial features to determine if such features can serve as guides for the model. The efficacy of this method is validated by improvements in model performance on the Aff-wild2 dataset.
arXiv Detail & Related papers (2024-03-19T16:21:47Z)
Exploring Facial Expression Recognition through Semi-Supervised Pretraining and Temporal Modeling [8.809586885539002]
This paper presents our approach for the upcoming 6th Affective Behavior Analysis in-the-Wild (ABAW) competition. In the 6th ABAW competition, our method achieved outstanding results on the official validation set.
arXiv Detail & Related papers (2024-03-18T16:36:54Z)
Realistic Speech-to-Face Generation with Speech-Conditioned Latent Diffusion Model with Face Prior [13.198105709331617]
We propose a novel speech-to-face generation framework, which leverages a Speech-Conditioned Latent Diffusion Model, called SCLDM. This is the first work to harness the exceptional modeling capabilities of diffusion models for speech-to-face generation. We show that our method can produce more realistic face images while preserving the identity of the speaker better than state-of-the-art methods.
arXiv Detail & Related papers (2023-10-05T07:44:49Z)
Learning Diversified Feature Representations for Facial Expression Recognition in the Wild [97.14064057840089]
We propose a mechanism to diversify the features extracted by CNN layers of state-of-the-art facial expression recognition architectures. Experimental results on three well-known facial expression recognition in-the-wild datasets, AffectNet, FER+, and RAF-DB, show the effectiveness of our method.
arXiv Detail & Related papers (2022-10-17T19:25:28Z)
CIAO! A Contrastive Adaptation Mechanism for Non-Universal Facial Expression Recognition [80.07590100872548]
We propose Contrastive Inhibitory Adaptati On (CIAO), a mechanism that adapts the last layer of facial encoders to depict specific affective characteristics on different datasets. CIAO presents an improvement in facial expression recognition performance over six different datasets with very unique affective representations.
arXiv Detail & Related papers (2022-08-10T15:46:05Z)
Audio-visual multi-channel speech separation, dereverberation and recognition [70.34433820322323]
This paper proposes an audio-visual multi-channel speech separation, dereverberation and recognition approach. The advantage of the additional visual modality over using audio only is demonstrated on two neural dereverberation approaches. Experiments conducted on the LRS2 dataset suggest that the proposed audio-visual multi-channel speech separation, dereverberation and recognition system outperforms the baseline.
arXiv Detail & Related papers (2022-04-05T04:16:03Z)
A cross-modal fusion network based on self-attention and residual structure for multimodal emotion recognition [7.80238628278552]
We propose a novel cross-modal fusion network based on self-attention and residual structure (CFN-SR) for multimodal emotion recognition. To verify the effectiveness of the proposed method, we conduct experiments on the RAVDESS dataset. The experimental results show that the proposed CFN-SR achieves the state-of-the-art and obtains 75.76% accuracy with 26.30M parameters.
arXiv Detail & Related papers (2021-11-03T12:24:03Z)
Improved Speech Emotion Recognition using Transfer Learning and Spectrogram Augmentation [56.264157127549446]
Speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction. One of the main challenges in SER is data scarcity. We propose a transfer learning strategy combined with spectrogram augmentation.
arXiv Detail & Related papers (2021-08-05T10:39:39Z)
Hierarchical Deep CNN Feature Set-Based Representation Learning for Robust Cross-Resolution Face Recognition [59.29808528182607]
Cross-resolution face recognition (CRFR) is important in intelligent surveillance and biometric forensics. Existing shallow learning-based and deep learning-based methods focus on mapping the HR-LR face pairs into a joint feature space. In this study, we desire to fully exploit the multi-level deep convolutional neural network (CNN) feature set for robust CRFR.
arXiv Detail & Related papers (2021-03-25T14:03:42Z)
Facial Expressions as a Vulnerability in Face Recognition [73.85525896663371]
This work explores facial expression bias as a security vulnerability of face recognition systems. We present a comprehensive analysis of how facial expression bias impacts the performance of face recognition technologies.
arXiv Detail & Related papers (2020-11-17T18:12:41Z)
Adversarial Bipartite Graph Learning for Video Domain Adaptation [50.68420708387015]
Domain adaptation techniques, which focus on adapting models between distributionally different domains, are rarely explored in the video recognition area. Recent works on visual domain adaptation which leverage adversarial learning to unify the source and target video representations are not highly effective on the videos. This paper proposes an Adversarial Bipartite Graph (ABG) learning framework which directly models the source-target interactions.
arXiv Detail & Related papers (2020-07-31T03:48:41Z)
Joint Deep Learning of Facial Expression Synthesis and Recognition [97.19528464266824]
We propose a novel joint deep learning of facial expression synthesis and recognition method for effective FER. The proposed method involves a two-stage learning procedure. Firstly, a facial expression synthesis generative adversarial network (FESGAN) is pre-trained to generate facial images with different facial expressions. In order to alleviate the problem of data bias between the real images and the synthetic images, we propose an intra-class loss with a novel real data-guided back-propagation (RDBP) algorithm.
arXiv Detail & Related papers (2020-02-06T10:56:00Z)
Continuous Emotion Recognition via Deep Convolutional Autoencoder and Support Vector Regressor [70.2226417364135]
It is crucial that the machine should be able to recognize the emotional state of the user with high accuracy. Deep neural networks have been used with great success in recognizing emotions. We present a new model for continuous emotion recognition based on facial expression recognition.
arXiv Detail & Related papers (2020-01-31T17:47:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.