FER-YOLO-Mamba: Facial Expression Detection and Classification Based on Selective State Space
- URL: http://arxiv.org/abs/2405.01828v3
- Date: Fri, 10 May 2024 02:49:16 GMT
- Title: FER-YOLO-Mamba: Facial Expression Detection and Classification Based on Selective State Space
- Authors: Hui Ma, Sen Lei, Turgay Celik, Heng-Chao Li,
- Abstract summary: This paper presents the FER-YOLO-Mamba model, which integrates the principles of Mamba and YOLO technologies.
Within the FER-YOLO-Mamba model, we further devise a FER-YOLO-VSS dual-branch module, which combines the inherent strengths of convolutional layers in local feature extraction.
To the best of our knowledge, this is the first Vision Mamba model designed for facial expression detection and classification.
- Score: 9.68374853606234
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Facial Expression Recognition (FER) plays a pivotal role in understanding human emotional cues. However, traditional FER methods based on visual information have some limitations, such as preprocessing, feature extraction, and multi-stage classification procedures. These not only increase computational complexity but also require a significant amount of computing resources. Considering Convolutional Neural Network (CNN)-based FER schemes frequently prove inadequate in identifying the deep, long-distance dependencies embedded within facial expression images, and the Transformer's inherent quadratic computational complexity, this paper presents the FER-YOLO-Mamba model, which integrates the principles of Mamba and YOLO technologies to facilitate efficient coordination in facial expression image recognition and localization. Within the FER-YOLO-Mamba model, we further devise a FER-YOLO-VSS dual-branch module, which combines the inherent strengths of convolutional layers in local feature extraction with the exceptional capability of State Space Models (SSMs) in revealing long-distance dependencies. To the best of our knowledge, this is the first Vision Mamba model designed for facial expression detection and classification. To evaluate the performance of the proposed FER-YOLO-Mamba model, we conducted experiments on two benchmark datasets, RAF-DB and SFEW. The experimental results indicate that the FER-YOLO-Mamba model achieved better results compared to other models. The code is available from https://github.com/SwjtuMa/FER-YOLO-Mamba.
Related papers
- Hierarchical Spatio-Temporal State-Space Modeling for fMRI Analysis [1.7329715392023939]
We introduce functional functional Mamba (FST-Mamba), a Mamba-based model designed for discovering neurological biomarkers using fMRI.
We propose a component-wise varied-scale aggregation (CVA) mechanism to aggregate connectivity across individual components within brain networks.
Experimental results demonstrate significant improvements in the proposed FST-Mamba model on various brain-based classification and regression tasks.
arXiv Detail & Related papers (2024-08-23T13:58:14Z) - Mamba YOLO: SSMs-Based YOLO For Object Detection [9.879086222226617]
Mamba-YOLO is a novel object detection model based on State Space Models.
We show that Mamba-YOLO surpasses the existing YOLO series models in both performance and competitiveness.
arXiv Detail & Related papers (2024-06-09T15:56:19Z) - Mamba-in-Mamba: Centralized Mamba-Cross-Scan in Tokenized Mamba Model for Hyperspectral Image Classification [4.389334324926174]
This study introduces the innovative Mamba-in-Mamba (MiM) architecture for HSI classification, the first attempt of deploying State Space Model (SSM) in this task.
MiM model includes 1) A novel centralized Mamba-Cross-Scan (MCS) mechanism for transforming images into sequence-data, 2) A Tokenized Mamba (T-Mamba) encoder, and 3) A Weighted MCS Fusion (WMF) module.
Experimental results from three public HSI datasets demonstrate that our method outperforms existing baselines and state-of-the-art approaches.
arXiv Detail & Related papers (2024-05-20T13:19:02Z) - MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild [81.32127423981426]
Multimodal emotion recognition based on audio and video data is important for real-world applications.
Recent methods have focused on exploiting advances of self-supervised learning (SSL) for pre-training of strong multimodal encoders.
We propose a different perspective on the problem and investigate the advancement of multimodal DFER performance by adapting SSL-pre-trained disjoint unimodal encoders.
arXiv Detail & Related papers (2024-04-13T13:39:26Z) - Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement
Learning [53.00683059396803]
Mask image model (MIM) has been widely used due to its simplicity and effectiveness in recovering original information from masked images.
We propose a decision-based MIM that utilizes reinforcement learning (RL) to automatically search for optimal image masking ratio and masking strategy.
Our approach has a significant advantage over alternative self-supervised methods on the task of neuron segmentation.
arXiv Detail & Related papers (2023-10-06T10:40:46Z) - INFOrmation Prioritization through EmPOWERment in Visual Model-Based RL [90.06845886194235]
We propose a modified objective for model-based reinforcement learning (RL)
We integrate a term inspired by variational empowerment into a state-space model based on mutual information.
We evaluate the approach on a suite of vision-based robot control tasks with natural video backgrounds.
arXiv Detail & Related papers (2022-04-18T23:09:23Z) - Multi-Branch Deep Radial Basis Function Networks for Facial Emotion
Recognition [80.35852245488043]
We propose a CNN based architecture enhanced with multiple branches formed by radial basis function (RBF) units.
RBF units capture local patterns shared by similar instances using an intermediate representation.
We show it is the incorporation of local information what makes the proposed model competitive.
arXiv Detail & Related papers (2021-09-07T21:05:56Z) - Self-Regression Learning for Blind Hyperspectral Image Fusion Without
Label [11.291055330647977]
We propose a self-regression learning method that reconstructs hyperspectral image (HSI) and estimate the observation model.
In particular, we adopt an invertible neural network (INN) for restoring the HSI, and two fully-connected networks (FCN) for estimating the observation model.
Our model can outperform the state-of-the-art methods in experiments on both synthetic and real-world dataset.
arXiv Detail & Related papers (2021-03-31T04:48:21Z) - Video-based Facial Expression Recognition using Graph Convolutional
Networks [57.980827038988735]
We introduce a Graph Convolutional Network (GCN) layer into a common CNN-RNN based model for video-based facial expression recognition.
We evaluate our method on three widely-used datasets, CK+, Oulu-CASIA and MMI, and also one challenging wild dataset AFEW8.0.
arXiv Detail & Related papers (2020-10-26T07:31:51Z) - Kernelized dense layers for facial expression recognition [10.98068123467568]
We propose a Kernelized Dense Layer (KDL) which captures higher order feature interactions instead of conventional linear relations.
We show that our model achieves competitive results with respect to the state-of-the-art approaches.
arXiv Detail & Related papers (2020-09-22T21:02:00Z) - Multi-Margin based Decorrelation Learning for Heterogeneous Face
Recognition [90.26023388850771]
This paper presents a deep neural network approach to extract decorrelation representations in a hyperspherical space for cross-domain face images.
The proposed framework can be divided into two components: heterogeneous representation network and decorrelation representation learning.
Experimental results on two challenging heterogeneous face databases show that our approach achieves superior performance on both verification and recognition tasks.
arXiv Detail & Related papers (2020-05-25T07:01:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.