HadaSmileNet: Hadamard fusion of handcrafted and deep-learning features for enhancing facial emotion recognition of genuine smiles
- URL: http://arxiv.org/abs/2509.18550v1
- Date: Tue, 23 Sep 2025 02:20:43 GMT
- Title: HadaSmileNet: Hadamard fusion of handcrafted and deep-learning features for enhancing facial emotion recognition of genuine smiles
- Authors: Mohammad Junayed Hasan, Nabeel Mohammed, Shafin Rahman, Philipp Koehn,
- Abstract summary: The distinction between genuine and posed emotions represents a fundamental pattern recognition challenge.<n>HadaSmileNet is a novel feature fusion framework that directly integrates transformer-based representations with physiologically grounded D-Markers.<n>The framework's efficiency and effectiveness make it particularly suitable for practical deployment in multimedia data mining applications.
- Score: 16.29396284428089
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The distinction between genuine and posed emotions represents a fundamental pattern recognition challenge with significant implications for data mining applications in social sciences, healthcare, and human-computer interaction. While recent multi-task learning frameworks have shown promise in combining deep learning architectures with handcrafted D-Marker features for smile facial emotion recognition, these approaches exhibit computational inefficiencies due to auxiliary task supervision and complex loss balancing requirements. This paper introduces HadaSmileNet, a novel feature fusion framework that directly integrates transformer-based representations with physiologically grounded D-Markers through parameter-free multiplicative interactions. Through systematic evaluation of 15 fusion strategies, we demonstrate that Hadamard multiplicative fusion achieves optimal performance by enabling direct feature interactions while maintaining computational efficiency. The proposed approach establishes new state-of-the-art results for deep learning methods across four benchmark datasets: UvA-NEMO (88.7 percent, +0.8), MMI (99.7 percent), SPOS (98.5 percent, +0.7), and BBC (100 percent, +5.0). Comprehensive computational analysis reveals 26 percent parameter reduction and simplified training compared to multi-task alternatives, while feature visualization demonstrates enhanced discriminative power through direct domain knowledge integration. The framework's efficiency and effectiveness make it particularly suitable for practical deployment in multimedia data mining applications that require real-time affective computing capabilities.
Related papers
- Interaction-Grounded Learning for Contextual Markov Decision Processes with Personalized Feedback [59.287761696290865]
We propose a computationally efficient algorithm that achieves a sublinear regret guarantee for contextual episodic Markov Decision Processes (MDPs) with personalized feedback.<n>We demonstrate the effectiveness of our method in learning personalized objectives from multi-turn interactions through experiments on both a synthetic episodic MDP and a real-world user booking dataset.
arXiv Detail & Related papers (2026-02-09T06:29:54Z) - A Cloud-Based Cross-Modal Transformer for Emotion Recognition and Adaptive Human-Computer Interaction [4.6927139685668315]
Cloud-Based Cross-Modal Transformer (CMT) framework for multimodal emotion recognition and adaptive human-computer interaction.<n>Model integrates visual, auditory, and textual signals using pretrained encoders.<n>System enables scalable, low-latency emotion recognition for large-scale user interactions.
arXiv Detail & Related papers (2025-11-21T17:29:16Z) - Focus Through Motion: RGB-Event Collaborative Token Sparsification for Efficient Object Detection [56.88160531995454]
Existing RGB-Event detection methods process the low-information regions of both modalities uniformly during feature extraction and fusion.<n>We propose FocusMamba, which performs adaptive collaborative sparsification of multimodal features.<n>Experiments on the DSEC-Det and PKU-DAVIS-SOD datasets demonstrate that the proposed method achieves superior performance in both accuracy and efficiency.
arXiv Detail & Related papers (2025-09-04T04:18:46Z) - Toward Efficient Speech Emotion Recognition via Spectral Learning and Attention [0.5371337604556311]
Speech Emotion Recognition (SER) traditionally relies on auditory data analysis for emotion classification.<n>We use Mel-Frequency Cepstral Coefficients (MFCCs) as spectral features to bridge the gap between computational emotion processing and human auditory perception.<n>We propose a novel 1D-CNN-based SER framework that integrates data augmentation techniques.
arXiv Detail & Related papers (2025-07-04T01:55:49Z) - Milmer: a Framework for Multiple Instance Learning based Multimodal Emotion Recognition [16.616341358877243]
This study addresses the challenges of emotion recognition by integrating facial expression analysis with electroencephalogram (EEG) signals.<n>The proposed framework employs a transformer-based fusion approach to effectively integrate visual and physiological modalities.<n>A key innovation of this work is the adoption of a multiple instance learning (MIL) approach, which extracts meaningful information from multiple facial expression images.
arXiv Detail & Related papers (2025-02-01T20:32:57Z) - GCM-Net: Graph-enhanced Cross-Modal Infusion with a Metaheuristic-Driven Network for Video Sentiment and Emotion Analysis [2.012311338995539]
This paper presents a novel framework that leverages the multi-modal contextual information from utterances and applies metaheuristic algorithms to learn for utterance-level sentiment and emotion prediction.
To show the effectiveness of our approach, we have conducted extensive evaluations on three prominent multimodal benchmark datasets.
arXiv Detail & Related papers (2024-10-02T10:07:48Z) - Self-supervised Gait-based Emotion Representation Learning from Selective Strongly Augmented Skeleton Sequences [4.740624855896404]
We propose a contrastive learning framework utilizing selective strong augmentation for self-supervised gait-based emotion representation.
Our approach is validated on the Emotion-Gait (E-Gait) and Emilya datasets and outperforms the state-of-the-art methods under different evaluation protocols.
arXiv Detail & Related papers (2024-05-08T09:13:10Z) - Joint Multimodal Transformer for Emotion Recognition in the Wild [49.735299182004404]
Multimodal emotion recognition (MMER) systems typically outperform unimodal systems.
This paper proposes an MMER method that relies on a joint multimodal transformer (JMT) for fusion with key-based cross-attention.
arXiv Detail & Related papers (2024-03-15T17:23:38Z) - Multimodal Emotion Recognition using Transfer Learning from Speaker
Recognition and BERT-based models [53.31917090073727]
We propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities.
We evaluate the effectiveness of our proposed multimodal approach on the interactive emotional dyadic motion capture dataset.
arXiv Detail & Related papers (2022-02-16T00:23:42Z) - Improved Speech Emotion Recognition using Transfer Learning and
Spectrogram Augmentation [56.264157127549446]
Speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction.
One of the main challenges in SER is data scarcity.
We propose a transfer learning strategy combined with spectrogram augmentation.
arXiv Detail & Related papers (2021-08-05T10:39:39Z) - Continuous Emotion Recognition via Deep Convolutional Autoencoder and
Support Vector Regressor [70.2226417364135]
It is crucial that the machine should be able to recognize the emotional state of the user with high accuracy.
Deep neural networks have been used with great success in recognizing emotions.
We present a new model for continuous emotion recognition based on facial expression recognition.
arXiv Detail & Related papers (2020-01-31T17:47:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.