A Unified Transformer-based Network for multimodal Emotion Recognition
- URL: http://arxiv.org/abs/2308.14160v1
- Date: Sun, 27 Aug 2023 17:30:56 GMT
- Title: A Unified Transformer-based Network for multimodal Emotion Recognition
- Authors: Kamran Ali and Charles E. Hughes
- Abstract summary: We present a transformer-based method to classify emotions in an arousal-valence space by combining a 2D representation of an ECG/ signal with the face information.
Our model produces comparable results to the state-of-the-art techniques.
- Score: 4.07926531936425
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The development of transformer-based models has resulted in significant
advances in addressing various vision and NLP-based research challenges.
However, the progress made in transformer-based methods has not been
effectively applied to biosensing research. This paper presents a novel Unified
Biosensor-Vision Multi-modal Transformer-based (UBVMT) method to classify
emotions in an arousal-valence space by combining a 2D representation of an
ECG/PPG signal with the face information. To achieve this goal, we first
investigate and compare the unimodal emotion recognition performance of three
image-based representations of the ECG/PPG signal. We then present our UBVMT
network which is trained to perform emotion recognition by combining the 2D
image-based representation of the ECG/PPG signal and the facial expression
features. Our unified transformer model consists of homogeneous transformer
blocks that take as an input the 2D representation of the ECG/PPG signal and
the corresponding face frame for emotion representation learning with minimal
modality-specific design. Our UBVMT model is trained by reconstructing masked
patches of video frames and 2D images of ECG/PPG signals, and contrastive
modeling to align face and ECG/PPG data. Extensive experiments on the
MAHNOB-HCI and DEAP datasets show that our Unified UBVMT-based model produces
comparable results to the state-of-the-art techniques.
Related papers
- Prototype Learning Guided Hybrid Network for Breast Tumor Segmentation in DCE-MRI [58.809276442508256]
We propose a hybrid network via the combination of convolution neural network (CNN) and transformer layers.
The experimental results on private and public DCE-MRI datasets demonstrate that the proposed hybrid network superior performance than the state-of-the-art methods.
arXiv Detail & Related papers (2024-08-11T15:46:00Z) - Spatiotemporal Pooling on Appropriate Topological Maps Represented as
Two-Dimensional Images for EEG Classification [0.0]
Motor classification based on electroencephalography (EEG) signals is one of the most important brain-computer interface applications.
This study proposes a novel EEG-based motor imagery classification method with three key features.
Experimental results using the PhysioNet EEG Movement Motor/Imagery dataset showed that the proposed method achieved the best classification accuracy of 88.57%.
arXiv Detail & Related papers (2024-03-07T09:35:49Z) - Learning Robust Deep Visual Representations from EEG Brain Recordings [13.768240137063428]
This study proposes a two-stage method where the first step is to obtain EEG-derived features for robust learning of deep representations.
We demonstrate the generalizability of our feature extraction pipeline across three different datasets using deep-learning architectures.
We propose a novel framework to transform unseen images into the EEG space and reconstruct them with approximation.
arXiv Detail & Related papers (2023-10-25T10:26:07Z) - Customizing General-Purpose Foundation Models for Medical Report
Generation [64.31265734687182]
The scarcity of labelled medical image-report pairs presents great challenges in the development of deep and large-scale neural networks.
We propose customizing off-the-shelf general-purpose large-scale pre-trained models, i.e., foundation models (FMs) in computer vision and natural language processing.
arXiv Detail & Related papers (2023-06-09T03:02:36Z) - Multi-scale Transformer-based Network for Emotion Recognition from Multi
Physiological Signals [11.479653866646762]
This paper presents an efficient Multi-scale Transformer-based approach for the task of Emotion recognition from Physiological data.
Our approach involves applying a Multi-modal technique combined with scaling data to establish the relationship between internal body signals and human emotions.
Our model achieves decent results on the CASE dataset of the EPiC competition, with an RMSE score of 1.45.
arXiv Detail & Related papers (2023-05-01T11:10:48Z) - MedSegDiff-V2: Diffusion based Medical Image Segmentation with
Transformer [53.575573940055335]
We propose a novel Transformer-based Diffusion framework, called MedSegDiff-V2.
We verify its effectiveness on 20 medical image segmentation tasks with different image modalities.
arXiv Detail & Related papers (2023-01-19T03:42:36Z) - View-Disentangled Transformer for Brain Lesion Detection [50.4918615815066]
We propose a novel view-disentangled transformer to enhance the extraction of MRI features for more accurate tumour detection.
First, the proposed transformer harvests long-range correlation among different positions in a 3D brain scan.
Second, the transformer models a stack of slice features as multiple 2D views and enhance these features view-by-view.
Third, we deploy the proposed transformer module in a transformer backbone, which can effectively detect the 2D regions surrounding brain lesions.
arXiv Detail & Related papers (2022-09-20T11:58:23Z) - Transformer-Based Self-Supervised Learning for Emotion Recognition [0.0]
We propose to use a Transformer-based model to process electrocardiograms (ECG) for emotion recognition.
To overcome the relatively small size of datasets with emotional labels, we employ self-supervised learning.
We show that our approach reaches state-of-the-art performances for emotion recognition using ECG signals on AMIGOS.
arXiv Detail & Related papers (2022-04-08T07:14:55Z) - Multimodal Emotion Recognition using Transfer Learning from Speaker
Recognition and BERT-based models [53.31917090073727]
We propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities.
We evaluate the effectiveness of our proposed multimodal approach on the interactive emotional dyadic motion capture dataset.
arXiv Detail & Related papers (2022-02-16T00:23:42Z) - A Transformer Architecture for Stress Detection from ECG [7.559720049837459]
We present a deep neural network based on convolutional layers and a transformer mechanism to detect stress using ECG signals.
Our experiments show that the proposed model achieves strong results, comparable or better than the state-of-the-art models for ECG-based stress detection.
arXiv Detail & Related papers (2021-08-22T14:34:44Z) - IMAGINE: Image Synthesis by Image-Guided Model Inversion [79.4691654458141]
We introduce an inversion based method, denoted as IMAge-Guided model INvErsion (IMAGINE), to generate high-quality and diverse images.
We leverage the knowledge of image semantics from a pre-trained classifier to achieve plausible generations.
IMAGINE enables the synthesis procedure to simultaneously 1) enforce semantic specificity constraints during the synthesis, 2) produce realistic images without generator training, and 3) give users intuitive control over the generation process.
arXiv Detail & Related papers (2021-04-13T02:00:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.