Related papers: ExpressNet-MoE: A Hybrid Deep Neural Network for Emotion Recognition

ExpressNet-MoE: A Hybrid Deep Neural Network for Emotion Recognition

URL: http://arxiv.org/abs/2510.13493v2
Date: Fri, 24 Oct 2025 23:01:05 GMT
Title: ExpressNet-MoE: A Hybrid Deep Neural Network for Emotion Recognition
Authors: Deeptimaan Banerjee, Prateek Gothwal, Ashis Kumer Biswas,
Abstract summary: ExpressNet-MoE is a novel hybrid deep learning model that blends CNNs and MoE framework.<n>Our model achieves accuracies of 74.77% on AffectNet (v7), 72.55% on AffectNet (v8), 84.29% on RAF-DB, and 64.66% on FER-2013.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In many domains, including online education, healthcare, security, and human-computer interaction, facial emotion recognition (FER) is essential. Real-world FER is still difficult despite its significance because of some factors such as variable head positions, occlusions, illumination shifts, and demographic diversity. Engagement detection, which is essential for applications like virtual learning and customer services, is frequently challenging due to FER limitations by many current models. In this article, we propose ExpressNet-MoE, a novel hybrid deep learning model that blends both Convolution Neural Networks (CNNs) and Mixture of Experts (MoE) framework, to overcome the difficulties. Our model dynamically chooses the most pertinent expert networks, thus it aids in the generalization and providing flexibility to model across a wide variety of datasets. Our model improves on the accuracy of emotion recognition by utilizing multi-scale feature extraction to collect both global and local facial features. ExpressNet-MoE includes numerous CNN-based feature extractors, a MoE module for adaptive feature selection, and finally a residual network backbone for deep feature learning. To demonstrate efficacy of our proposed model we evaluated on several datasets, and compared with current state-of-the-art methods. Our model achieves accuracies of 74.77% on AffectNet (v7), 72.55% on AffectNet (v8), 84.29% on RAF-DB, and 64.66% on FER-2013. The results show how adaptive our model is and how it may be used to develop end-to-end emotion recognition systems in practical settings. Reproducible codes and results are made publicly accessible at https://github.com/DeeptimaanB/ExpressNet-MoE.

Related papers

Multi-modal Transfer Learning for Dynamic Facial Emotion Recognition in the Wild [0.14999444543328289]
Facial expression recognition (FER) is a subset of computer vision with important applications for human-computer-interaction, healthcare, and customer service.<n>In this paper, we examine the use of multi-modal transfer learning to improve performance on a challenging video-based FER dataset.
arXiv Detail & Related papers (2025-04-30T01:09:11Z)
An Efficient and Mixed Heterogeneous Model for Image Restoration [71.85124734060665]
Current mainstream approaches are based on three architectural paradigms: CNNs, Transformers, and Mambas.<n>We propose RestorMixer, an efficient and general-purpose IR model based on mixed-architecture fusion.
arXiv Detail & Related papers (2025-04-15T08:19:12Z)
SM3Det: A Unified Model for Multi-Modal Remote Sensing Object Detection [73.49799596304418]
This paper introduces a new task called Multi-Modal datasets and Multi-Task Object Detection (M2Det) for remote sensing.<n>It is designed to accurately detect horizontal or oriented objects from any sensor modality.<n>This task poses challenges due to 1) the trade-offs involved in managing multi-modal modelling and 2) the complexities of multi-task optimization.
arXiv Detail & Related papers (2024-12-30T02:47:51Z)
A Comparative Study of Transfer Learning for Emotion Recognition using CNN and Modified VGG16 Models [0.0]
We investigate the performance of CNN and Modified VGG16 models for emotion recognition tasks across two datasets: FER2013 and AffectNet. Our findings reveal that both models achieve reasonable performance on the FER2013 dataset, with the Modified VGG16 model demonstrating slightly increased accuracy. When evaluated on the Affect-Net dataset, performance declines for both models, with the Modified VGG16 model continuing to outperform the CNN.
arXiv Detail & Related papers (2024-07-19T17:41:46Z)
NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals [58.83169560132308]
We introduce NNsight and NDIF, technologies that work in tandem to enable scientific study of the representations and computations learned by very large neural networks.
arXiv Detail & Related papers (2024-07-18T17:59:01Z)
EMERSK -- Explainable Multimodal Emotion Recognition with Situational Knowledge [0.0]
We present Explainable Multimodal Emotion Recognition with Situational Knowledge (EMERSK) EMERSK is a general system for human emotion recognition and explanation using visual information. Our system can handle multiple modalities, including facial expressions, posture, and gait in a flexible and modular manner.
arXiv Detail & Related papers (2023-06-14T17:52:37Z)
Machine Learning for QoS Prediction in Vehicular Communication: Challenges and Solution Approaches [46.52224306624461]
We consider maximum throughput prediction enhancing, for example, streaming or high-definition mapping applications. We highlight how confidence can be built on machine learning technologies by better understanding the underlying characteristics of the collected data. We use explainable AI to show that machine learning can learn underlying principles of wireless networks without being explicitly programmed.
arXiv Detail & Related papers (2023-02-23T12:29:20Z)
Learning Diversified Feature Representations for Facial Expression Recognition in the Wild [97.14064057840089]
We propose a mechanism to diversify the features extracted by CNN layers of state-of-the-art facial expression recognition architectures. Experimental results on three well-known facial expression recognition in-the-wild datasets, AffectNet, FER+, and RAF-DB, show the effectiveness of our method.
arXiv Detail & Related papers (2022-10-17T19:25:28Z)
Multimodal End-to-End Group Emotion Recognition using Cross-Modal Attention [0.0]
Classifying group-level emotions is a challenging task due to complexity of video. Our model achieves best validation accuracy of 60.37% which is approximately 8.5% higher, than VGAF dataset baseline.
arXiv Detail & Related papers (2021-11-10T19:19:26Z)
Multi-Branch Deep Radial Basis Function Networks for Facial Emotion Recognition [80.35852245488043]
We propose a CNN based architecture enhanced with multiple branches formed by radial basis function (RBF) units. RBF units capture local patterns shared by similar instances using an intermediate representation. We show it is the incorporation of local information what makes the proposed model competitive.
arXiv Detail & Related papers (2021-09-07T21:05:56Z)
Facial Emotion Recognition: State of the Art Performance on FER2013 [0.0]
We achieve the highest single-network classification accuracy on the FER2013 dataset. Our model achieves state-of-the-art single-network accuracy of 73.28 % on FER2013 without using extra training data.
arXiv Detail & Related papers (2021-05-08T04:20:53Z)
Video-based Facial Expression Recognition using Graph Convolutional Networks [57.980827038988735]
We introduce a Graph Convolutional Network (GCN) layer into a common CNN-RNN based model for video-based facial expression recognition. We evaluate our method on three widely-used datasets, CK+, Oulu-CASIA and MMI, and also one challenging wild dataset AFEW8.0.
arXiv Detail & Related papers (2020-10-26T07:31:51Z)
Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction. We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data. Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.