Self-supervised Gait-based Emotion Representation Learning from Selective Strongly Augmented Skeleton Sequences
- URL: http://arxiv.org/abs/2405.04900v1
- Date: Wed, 8 May 2024 09:13:10 GMT
- Title: Self-supervised Gait-based Emotion Representation Learning from Selective Strongly Augmented Skeleton Sequences
- Authors: Cheng Song, Lu Lu, Zhen Ke, Long Gao, Shuai Ding,
- Abstract summary: We propose a contrastive learning framework utilizing selective strong augmentation for self-supervised gait-based emotion representation.
Our approach is validated on the Emotion-Gait (E-Gait) and Emilya datasets and outperforms the state-of-the-art methods under different evaluation protocols.
- Score: 4.740624855896404
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Emotion recognition is an important part of affective computing. Extracting emotional cues from human gaits yields benefits such as natural interaction, a nonintrusive nature, and remote detection. Recently, the introduction of self-supervised learning techniques offers a practical solution to the issues arising from the scarcity of labeled data in the field of gait-based emotion recognition. However, due to the limited diversity of gaits and the incompleteness of feature representations for skeletons, the existing contrastive learning methods are usually inefficient for the acquisition of gait emotions. In this paper, we propose a contrastive learning framework utilizing selective strong augmentation (SSA) for self-supervised gait-based emotion representation, which aims to derive effective representations from limited labeled gait data. First, we propose an SSA method for the gait emotion recognition task, which includes upper body jitter and random spatiotemporal mask. The goal of SSA is to generate more diverse and targeted positive samples and prompt the model to learn more distinctive and robust feature representations. Then, we design a complementary feature fusion network (CFFN) that facilitates the integration of cross-domain information to acquire topological structural and global adaptive features. Finally, we implement the distributional divergence minimization loss to supervise the representation learning of the generally and strongly augmented queries. Our approach is validated on the Emotion-Gait (E-Gait) and Emilya datasets and outperforms the state-of-the-art methods under different evaluation protocols.
Related papers
- Deep Imbalanced Learning for Multimodal Emotion Recognition in
Conversations [15.705757672984662]
Multimodal Emotion Recognition in Conversations (MERC) is a significant development direction for machine intelligence.
Many data in MERC naturally exhibit an imbalanced distribution of emotion categories, and researchers ignore the negative impact of imbalanced data on emotion recognition.
We propose the Class Boundary Enhanced Representation Learning (CBERL) model to address the imbalanced distribution of emotion categories in raw data.
We have conducted extensive experiments on the IEMOCAP and MELD benchmark datasets, and the results show that CBERL has achieved a certain performance improvement in the effectiveness of emotion recognition.
arXiv Detail & Related papers (2023-12-11T12:35:17Z) - Unsupervised Representations Improve Supervised Learning in Speech
Emotion Recognition [1.3812010983144798]
This study proposes an innovative approach that integrates self-supervised feature extraction with supervised classification for emotion recognition from small audio segments.
In the preprocessing step, we employed a self-supervised feature extractor, based on the Wav2Vec model, to capture acoustic features from audio data.
Then, the output featuremaps of the preprocessing step are fed to a custom designed Convolutional Neural Network (CNN)-based model to perform emotion classification.
arXiv Detail & Related papers (2023-09-22T08:54:06Z) - EmotionIC: emotional inertia and contagion-driven dependency modeling for emotion recognition in conversation [34.24557248359872]
We propose an emotional inertia and contagion-driven dependency modeling approach (EmotionIC) for ERC task.
Our EmotionIC consists of three main components, i.e., Identity Masked Multi-Head Attention (IMMHA), Dialogue-based Gated Recurrent Unit (DiaGRU) and Skip-chain Conditional Random Field (SkipCRF)
Experimental results show that our method can significantly outperform the state-of-the-art models on four benchmark datasets.
arXiv Detail & Related papers (2023-03-20T13:58:35Z) - Multimodal Emotion Recognition using Transfer Learning from Speaker
Recognition and BERT-based models [53.31917090073727]
We propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities.
We evaluate the effectiveness of our proposed multimodal approach on the interactive emotional dyadic motion capture dataset.
arXiv Detail & Related papers (2022-02-16T00:23:42Z) - Affect-DML: Context-Aware One-Shot Recognition of Human Affect using
Deep Metric Learning [29.262204241732565]
Existing methods assume that all emotions-of-interest are given a priori as annotated training examples.
We conceptualize one-shot recognition of emotions in context -- a new problem aimed at recognizing human affect states in finer particle level from a single support sample.
All variants of our model clearly outperform the random baseline, while leveraging the semantic scene context consistently improves the learnt representations.
arXiv Detail & Related papers (2021-11-30T10:35:20Z) - MEmoBERT: Pre-training Model with Prompt-based Learning for Multimodal
Emotion Recognition [118.73025093045652]
We propose a pre-training model textbfMEmoBERT for multimodal emotion recognition.
Unlike the conventional "pre-train, finetune" paradigm, we propose a prompt-based method that reformulates the downstream emotion classification task as a masked text prediction.
Our proposed MEmoBERT significantly enhances emotion recognition performance.
arXiv Detail & Related papers (2021-10-27T09:57:00Z) - Improved Speech Emotion Recognition using Transfer Learning and
Spectrogram Augmentation [56.264157127549446]
Speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction.
One of the main challenges in SER is data scarcity.
We propose a transfer learning strategy combined with spectrogram augmentation.
arXiv Detail & Related papers (2021-08-05T10:39:39Z) - An Attribute-Aligned Strategy for Learning Speech Representation [57.891727280493015]
We propose an attribute-aligned learning strategy to derive speech representation that can flexibly address these issues by attribute-selection mechanism.
Specifically, we propose a layered-representation variational autoencoder (LR-VAE), which factorizes speech representation into attribute-sensitive nodes.
Our proposed method achieves competitive performances on identity-free SER and a better performance on emotionless SV.
arXiv Detail & Related papers (2021-06-05T06:19:14Z) - Leveraging Semantic Scene Characteristics and Multi-Stream Convolutional
Architectures in a Contextual Approach for Video-Based Visual Emotion
Recognition in the Wild [31.40575057347465]
We tackle the task of video-based visual emotion recognition in the wild.
Standard methodologies that rely solely on the extraction of bodily and facial features often fall short of accurate emotion prediction.
We aspire to alleviate this problem by leveraging visual context in the form of scene characteristics and attributes.
arXiv Detail & Related papers (2021-05-16T17:31:59Z) - A Dependency Syntactic Knowledge Augmented Interactive Architecture for
End-to-End Aspect-based Sentiment Analysis [73.74885246830611]
We propose a novel dependency syntactic knowledge augmented interactive architecture with multi-task learning for end-to-end ABSA.
This model is capable of fully exploiting the syntactic knowledge (dependency relations and types) by leveraging a well-designed Dependency Relation Embedded Graph Convolutional Network (DreGcn)
Extensive experimental results on three benchmark datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-04T14:59:32Z) - Continuous Emotion Recognition via Deep Convolutional Autoencoder and
Support Vector Regressor [70.2226417364135]
It is crucial that the machine should be able to recognize the emotional state of the user with high accuracy.
Deep neural networks have been used with great success in recognizing emotions.
We present a new model for continuous emotion recognition based on facial expression recognition.
arXiv Detail & Related papers (2020-01-31T17:47:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.