Leveraging TCN and Transformer for effective visual-audio fusion in
continuous emotion recognition
- URL: http://arxiv.org/abs/2303.08356v3
- Date: Wed, 6 Sep 2023 08:15:03 GMT
- Title: Leveraging TCN and Transformer for effective visual-audio fusion in
continuous emotion recognition
- Authors: Weiwei Zhou, Jiada Lu, Zhaolong Xiong, Weifeng Wang
- Abstract summary: We present our approach to the Valence-Arousal (VA) Estimation Challenge, Expression (Expr) Classification Challenge, and Action Unit (AU) Detection Challenge.
We propose a novel multi-modal fusion model that leverages Temporal Convolutional Networks (TCN) and Transformer to enhance the performance of continuous emotion recognition.
- Score: 0.5370906227996627
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Human emotion recognition plays an important role in human-computer
interaction. In this paper, we present our approach to the Valence-Arousal (VA)
Estimation Challenge, Expression (Expr) Classification Challenge, and Action
Unit (AU) Detection Challenge of the 5th Workshop and Competition on Affective
Behavior Analysis in-the-wild (ABAW). Specifically, we propose a novel
multi-modal fusion model that leverages Temporal Convolutional Networks (TCN)
and Transformer to enhance the performance of continuous emotion recognition.
Our model aims to effectively integrate visual and audio information for
improved accuracy in recognizing emotions. Our model outperforms the baseline
and ranks 3 in the Expression Classification challenge.
Related papers
- Boosting Continuous Emotion Recognition with Self-Pretraining using Masked Autoencoders, Temporal Convolutional Networks, and Transformers [3.951847822557829]
We tackle the Valence-Arousal (VA) Estimation Challenge, Expression (Expr) Classification Challenge, and Action Unit (AU) Detection Challenge.
Our study advocates a novel approach aimed at refining continuous emotion recognition.
We achieve this by pre-training with Masked Autoencoders (MAE) on facial datasets, followed by fine-tuning on the aff-wild2 dataset annotated with expression (Expr) labels.
arXiv Detail & Related papers (2024-03-18T03:28:01Z) - Affective Behaviour Analysis via Integrating Multi-Modal Knowledge [24.74463315135503]
The 6th competition on Affective Behavior Analysis in-the-wild (ABAW) utilizes the Aff-Wild2, Hume-Vidmimic2, and C-EXPR-DB datasets.
We present our method designs for the five competitive tracks, i.e., Valence-Arousal (VA) Estimation, Expression (EXPR) Recognition, Action Unit (AU) Detection, Compound Expression (CE) Recognition, and Emotional Mimicry Intensity (EMI) Estimation.
arXiv Detail & Related papers (2024-03-16T06:26:43Z) - The 6th Affective Behavior Analysis in-the-wild (ABAW) Competition [53.718777420180395]
This paper describes the 6th Affective Behavior Analysis in-the-wild (ABAW) Competition.
The 6th ABAW Competition addresses contemporary challenges in understanding human emotions and behaviors.
arXiv Detail & Related papers (2024-02-29T16:49:38Z) - Watch the Speakers: A Hybrid Continuous Attribution Network for Emotion
Recognition in Conversation With Emotion Disentanglement [8.17164107060944]
Emotion Recognition in Conversation (ERC) has attracted widespread attention in the natural language processing field.
Existing ERC methods face challenges in achieving generalization to diverse scenarios due to insufficient modeling of context.
We present a Hybrid Continuous Attributive Network (HCAN) to address these issues in the perspective of emotional continuation and emotional attribution.
arXiv Detail & Related papers (2023-09-18T14:18:16Z) - EmotionIC: emotional inertia and contagion-driven dependency modeling for emotion recognition in conversation [34.24557248359872]
We propose an emotional inertia and contagion-driven dependency modeling approach (EmotionIC) for ERC task.
Our EmotionIC consists of three main components, i.e., Identity Masked Multi-Head Attention (IMMHA), Dialogue-based Gated Recurrent Unit (DiaGRU) and Skip-chain Conditional Random Field (SkipCRF)
Experimental results show that our method can significantly outperform the state-of-the-art models on four benchmark datasets.
arXiv Detail & Related papers (2023-03-20T13:58:35Z) - Multimodal Feature Extraction and Fusion for Emotional Reaction
Intensity Estimation and Expression Classification in Videos with
Transformers [47.16005553291036]
We present our solutions to the two sub-challenges of Affective Behavior Analysis in the wild (ABAW) 2023.
For the Expression Classification Challenge, we propose a streamlined approach that handles the challenges of classification effectively.
By studying, analyzing, and combining these features, we significantly enhance the model's accuracy for sentiment prediction in a multimodal context.
arXiv Detail & Related papers (2023-03-16T09:03:17Z) - Multimodal Emotion Recognition using Transfer Learning from Speaker
Recognition and BERT-based models [53.31917090073727]
We propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities.
We evaluate the effectiveness of our proposed multimodal approach on the interactive emotional dyadic motion capture dataset.
arXiv Detail & Related papers (2022-02-16T00:23:42Z) - MEmoBERT: Pre-training Model with Prompt-based Learning for Multimodal
Emotion Recognition [118.73025093045652]
We propose a pre-training model textbfMEmoBERT for multimodal emotion recognition.
Unlike the conventional "pre-train, finetune" paradigm, we propose a prompt-based method that reformulates the downstream emotion classification task as a masked text prediction.
Our proposed MEmoBERT significantly enhances emotion recognition performance.
arXiv Detail & Related papers (2021-10-27T09:57:00Z) - Improved Speech Emotion Recognition using Transfer Learning and
Spectrogram Augmentation [56.264157127549446]
Speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction.
One of the main challenges in SER is data scarcity.
We propose a transfer learning strategy combined with spectrogram augmentation.
arXiv Detail & Related papers (2021-08-05T10:39:39Z) - Continuous Emotion Recognition via Deep Convolutional Autoencoder and
Support Vector Regressor [70.2226417364135]
It is crucial that the machine should be able to recognize the emotional state of the user with high accuracy.
Deep neural networks have been used with great success in recognizing emotions.
We present a new model for continuous emotion recognition based on facial expression recognition.
arXiv Detail & Related papers (2020-01-31T17:47:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.