Transformer-based Self-supervised Multimodal Representation Learning for
Wearable Emotion Recognition
- URL: http://arxiv.org/abs/2303.17611v1
- Date: Wed, 29 Mar 2023 19:45:55 GMT
- Title: Transformer-based Self-supervised Multimodal Representation Learning for
Wearable Emotion Recognition
- Authors: Yujin Wu, Mohamed Daoudi, Ali Amad
- Abstract summary: We propose a novel self-supervised learning (SSL) framework for wearable emotion recognition.
Our method achieved state-of-the-art results in various emotion classification tasks.
- Score: 2.4364387374267427
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, wearable emotion recognition based on peripheral physiological
signals has drawn massive attention due to its less invasive nature and its
applicability in real-life scenarios. However, how to effectively fuse
multimodal data remains a challenging problem. Moreover, traditional
fully-supervised based approaches suffer from overfitting given limited labeled
data. To address the above issues, we propose a novel self-supervised learning
(SSL) framework for wearable emotion recognition, where efficient multimodal
fusion is realized with temporal convolution-based modality-specific encoders
and a transformer-based shared encoder, capturing both intra-modal and
inter-modal correlations. Extensive unlabeled data is automatically assigned
labels by five signal transforms, and the proposed SSL model is pre-trained
with signal transformation recognition as a pretext task, allowing the
extraction of generalized multimodal representations for emotion-related
downstream tasks. For evaluation, the proposed SSL model was first pre-trained
on a large-scale self-collected physiological dataset and the resulting encoder
was subsequently frozen or fine-tuned on three public supervised emotion
recognition datasets. Ultimately, our SSL-based method achieved
state-of-the-art results in various emotion classification tasks. Meanwhile,
the proposed model proved to be more accurate and robust compared to
fully-supervised methods on low data regimes.
Related papers
- MAPL: Memory Augmentation and Pseudo-Labeling for Semi-Supervised Anomaly Detection [0.0]
A new meth-odology for detecting surface defects in in-dustrial settings is introduced, referred to as Memory Augmentation and Pseudo-Labeling(MAPL)
The methodology first in-troduces an anomaly simulation strategy, which significantly improves the model's ability to recognize rare or unknown anom-aly types.
An end-to-end learning framework is employed by MAPL to identify the abnormal regions directly from the input data.
arXiv Detail & Related papers (2024-05-10T02:26:35Z) - MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild [81.32127423981426]
Multimodal emotion recognition based on audio and video data is important for real-world applications.
Recent methods have focused on exploiting advances of self-supervised learning (SSL) for pre-training of strong multimodal encoders.
We propose a different perspective on the problem and investigate the advancement of multimodal DFER performance by adapting SSL-pre-trained disjoint unimodal encoders.
arXiv Detail & Related papers (2024-04-13T13:39:26Z) - Joint Multimodal Transformer for Emotion Recognition in the Wild [49.735299182004404]
Multimodal emotion recognition (MMER) systems typically outperform unimodal systems.
This paper proposes an MMER method that relies on a joint multimodal transformer (JMT) for fusion with key-based cross-attention.
arXiv Detail & Related papers (2024-03-15T17:23:38Z) - TACOformer:Token-channel compounded Cross Attention for Multimodal
Emotion Recognition [0.951828574518325]
We propose a comprehensive perspective of multimodal fusion that integrates channel-level and token-level cross-modal interactions.
Specifically, we introduce a unified cross attention module called Token-chAnnel COmpound (TACO) Cross Attention.
We also propose a 2D position encoding method to preserve information about the spatial distribution of EEG signal channels.
arXiv Detail & Related papers (2023-06-23T16:28:12Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - Multimodal Emotion Recognition using Transfer Learning from Speaker
Recognition and BERT-based models [53.31917090073727]
We propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities.
We evaluate the effectiveness of our proposed multimodal approach on the interactive emotional dyadic motion capture dataset.
arXiv Detail & Related papers (2022-02-16T00:23:42Z) - MEmoBERT: Pre-training Model with Prompt-based Learning for Multimodal
Emotion Recognition [118.73025093045652]
We propose a pre-training model textbfMEmoBERT for multimodal emotion recognition.
Unlike the conventional "pre-train, finetune" paradigm, we propose a prompt-based method that reformulates the downstream emotion classification task as a masked text prediction.
Our proposed MEmoBERT significantly enhances emotion recognition performance.
arXiv Detail & Related papers (2021-10-27T09:57:00Z) - Enhancing Unsupervised Anomaly Detection with Score-Guided Network [13.127091975959358]
Anomaly detection plays a crucial role in various real-world applications, including healthcare and finance systems.
We propose a novel scoring network with a score-guided regularization to learn and enlarge the anomaly score disparities between normal and abnormal data.
We next propose a score-guided autoencoder (SG-AE), incorporating the scoring network into an autoencoder framework for anomaly detection.
arXiv Detail & Related papers (2021-09-10T06:14:53Z) - Improved Speech Emotion Recognition using Transfer Learning and
Spectrogram Augmentation [56.264157127549446]
Speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction.
One of the main challenges in SER is data scarcity.
We propose a transfer learning strategy combined with spectrogram augmentation.
arXiv Detail & Related papers (2021-08-05T10:39:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.