MSA-GCN:Multiscale Adaptive Graph Convolution Network for Gait Emotion
Recognition
- URL: http://arxiv.org/abs/2209.08988v1
- Date: Mon, 19 Sep 2022 13:07:16 GMT
- Title: MSA-GCN:Multiscale Adaptive Graph Convolution Network for Gait Emotion
Recognition
- Authors: Yunfei Yin, Li Jing, Faliang Huang, Guangchao Yang, Zhuowei Wang
- Abstract summary: We present a novel Multi Scale Adaptive Graph Convolution Network (MSA-GCN) to recognize emotions.
In our model, a adaptive selective spatial-temporal convolution is designed to select the convolution kernel dynamically to obtain the soft-temporal features of different emotions.
Compared with previous state-of-the-art methods, the proposed method achieves the best performance on two public datasets.
- Score: 6.108523790270448
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gait emotion recognition plays a crucial role in the intelligent system. Most
of the existing methods recognize emotions by focusing on local actions over
time. However, they ignore that the effective distances of different emotions
in the time domain are different, and the local actions during walking are
quite similar. Thus, emotions should be represented by global states instead of
indirect local actions. To address these issues, a novel Multi Scale Adaptive
Graph Convolution Network (MSA-GCN) is presented in this work through
constructing dynamic temporal receptive fields and designing multiscale
information aggregation to recognize emotions. In our model, a adaptive
selective spatial-temporal graph convolution is designed to select the
convolution kernel dynamically to obtain the soft spatio-temporal features of
different emotions. Moreover, a Cross-Scale mapping Fusion Mechanism (CSFM) is
designed to construct an adaptive adjacency matrix to enhance information
interaction and reduce redundancy. Compared with previous state-of-the-art
methods, the proposed method achieves the best performance on two public
datasets, improving the mAP by 2\%. We also conduct extensive ablations studies
to show the effectiveness of different components in our methods.
Related papers
- Dynamic Graph Neural ODE Network for Multi-modal Emotion Recognition in Conversation [14.158939954453933]
We propose a Dynamic Graph Neural Ordinary Differential Equation Network (DGODE) for Multimodal emotion recognition in conversation (MERC)
The proposed DGODE combines the dynamic changes of emotions to capture the temporal dependency of speakers' emotions.
Experiments on two publicly available multimodal emotion recognition datasets demonstrate that the proposed DGODE model has superior performance compared to various baselines.
arXiv Detail & Related papers (2024-12-04T01:07:59Z) - Self-supervised Gait-based Emotion Representation Learning from Selective Strongly Augmented Skeleton Sequences [4.740624855896404]
We propose a contrastive learning framework utilizing selective strong augmentation for self-supervised gait-based emotion representation.
Our approach is validated on the Emotion-Gait (E-Gait) and Emilya datasets and outperforms the state-of-the-art methods under different evaluation protocols.
arXiv Detail & Related papers (2024-05-08T09:13:10Z) - Adversarial Representation with Intra-Modal and Inter-Modal Graph Contrastive Learning for Multimodal Emotion Recognition [14.639340916340801]
We propose a novel Adversarial Representation with Intra-Modal and Inter-Modal Graph Contrastive for Multimodal Emotion Recognition (AR-IIGCN) method.
Firstly, we input video, audio, and text features into a multi-layer perceptron (MLP) to map them into separate feature spaces.
Secondly, we build a generator and a discriminator for the three modal features through adversarial representation.
Thirdly, we introduce contrastive graph representation learning to capture intra-modal and inter-modal complementary semantic information.
arXiv Detail & Related papers (2023-12-28T01:57:26Z) - Improving EEG-based Emotion Recognition by Fusing Time-frequency And
Spatial Representations [29.962519978925236]
We propose a classification network based on the cross-domain feature fusion method.
We also propose a two-step fusion method and apply these methods to the EEG emotion recognition network.
Experimental results show that our proposed network, which combines multiple representations in the time-frequency domain and spatial domain, outperforms previous methods on public datasets.
arXiv Detail & Related papers (2023-03-14T07:26:51Z) - DWFormer: Dynamic Window transFormer for Speech Emotion Recognition [16.07391331544217]
We propose Dynamic Window transFormer (DWFormer) to locate important regions at different temporal scales.
DWFormer is evaluated on both the IEMOCAP and the MELD datasets.
Experimental results show that the proposed model achieves better performance than the previous state-of-the-art methods.
arXiv Detail & Related papers (2023-03-03T03:26:53Z) - Multimodal Emotion Recognition using Transfer Learning from Speaker
Recognition and BERT-based models [53.31917090073727]
We propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities.
We evaluate the effectiveness of our proposed multimodal approach on the interactive emotional dyadic motion capture dataset.
arXiv Detail & Related papers (2022-02-16T00:23:42Z) - Group Gated Fusion on Attention-based Bidirectional Alignment for
Multimodal Emotion Recognition [63.07844685982738]
This paper presents a new model named as Gated Bidirectional Alignment Network (GBAN), which consists of an attention-based bidirectional alignment network over LSTM hidden states.
We empirically show that the attention-aligned representations outperform the last-hidden-states of LSTM significantly.
The proposed GBAN model outperforms existing state-of-the-art multimodal approaches on the IEMOCAP dataset.
arXiv Detail & Related papers (2022-01-17T09:46:59Z) - Efficient Modelling Across Time of Human Actions and Interactions [92.39082696657874]
We argue that current fixed-sized-temporal kernels in 3 convolutional neural networks (CNNDs) can be improved to better deal with temporal variations in the input.
We study how we can better handle between classes of actions, by enhancing their feature differences over different layers of the architecture.
The proposed approaches are evaluated on several benchmark action recognition datasets and show competitive results.
arXiv Detail & Related papers (2021-10-05T15:39:11Z) - Domain Adaptive Robotic Gesture Recognition with Unsupervised
Kinematic-Visual Data Alignment [60.31418655784291]
We propose a novel unsupervised domain adaptation framework which can simultaneously transfer multi-modality knowledge, i.e., both kinematic and visual data, from simulator to real robot.
It remedies the domain gap with enhanced transferable features by using temporal cues in videos, and inherent correlations in multi-modal towards recognizing gesture.
Results show that our approach recovers the performance with great improvement gains, up to 12.91% in ACC and 20.16% in F1score without using any annotations in real robot.
arXiv Detail & Related papers (2021-03-06T09:10:03Z) - Emotional Semantics-Preserved and Feature-Aligned CycleGAN for Visual
Emotion Adaptation [85.20533077846606]
Unsupervised domain adaptation (UDA) studies the problem of transferring models trained on one labeled source domain to another unlabeled target domain.
In this paper, we focus on UDA in visual emotion analysis for both emotion distribution learning and dominant emotion classification.
We propose a novel end-to-end cycle-consistent adversarial model, termed CycleEmotionGAN++.
arXiv Detail & Related papers (2020-11-25T01:31:01Z) - Unsupervised Bidirectional Cross-Modality Adaptation via Deeply
Synergistic Image and Feature Alignment for Medical Image Segmentation [73.84166499988443]
We present a novel unsupervised domain adaptation framework, named as Synergistic Image and Feature Alignment (SIFA)
Our proposed SIFA conducts synergistic alignment of domains from both image and feature perspectives.
Experimental results on two different tasks demonstrate that our SIFA method is effective in improving segmentation performance on unlabeled target images.
arXiv Detail & Related papers (2020-02-06T13:49:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.