Efficient Feature Extraction and Late Fusion Strategy for Audiovisual Emotional Mimicry Intensity Estimation
- URL: http://arxiv.org/abs/2403.11757v2
- Date: Tue, 19 Mar 2024 18:14:35 GMT
- Title: Efficient Feature Extraction and Late Fusion Strategy for Audiovisual Emotional Mimicry Intensity Estimation
- Authors: Jun Yu, Wangyuan Zhu, Jichao Zhu,
- Abstract summary: The Emotional Mimicry Intensity (EMI) Estimation challenge task aims to evaluate the emotional intensity of seed videos.
We extracted rich dual-channel visual features based on ResNet18 and AUs for the video modality and effective single-channel features based on Wav2Vec2.0 for the audio modality.
We averaged the predictions of the visual and acoustic models, resulting in a more accurate estimation of audiovisual emotional mimicry intensity.
- Score: 8.529105068848828
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In this paper, we present the solution to the Emotional Mimicry Intensity (EMI) Estimation challenge, which is part of 6th Affective Behavior Analysis in-the-wild (ABAW) Competition.The EMI Estimation challenge task aims to evaluate the emotional intensity of seed videos by assessing them from a set of predefined emotion categories (i.e., "Admiration", "Amusement", "Determination", "Empathic Pain", "Excitement" and "Joy"). To tackle this challenge, we extracted rich dual-channel visual features based on ResNet18 and AUs for the video modality and effective single-channel features based on Wav2Vec2.0 for the audio modality. This allowed us to obtain comprehensive emotional features for the audiovisual modality. Additionally, leveraging a late fusion strategy, we averaged the predictions of the visual and acoustic models, resulting in a more accurate estimation of audiovisual emotional mimicry intensity. Experimental results validate the effectiveness of our approach, with the average Pearson's correlation Coefficient($\rho$) across the 6 emotion dimensionson the validation set achieving 0.3288.
Related papers
- The 6th Affective Behavior Analysis in-the-wild (ABAW) Competition [53.718777420180395]
This paper describes the 6th Affective Behavior Analysis in-the-wild (ABAW) Competition.
The 6th ABAW Competition addresses contemporary challenges in understanding human emotions and behaviors.
arXiv Detail & Related papers (2024-02-29T16:49:38Z) - Attention-based Interactive Disentangling Network for Instance-level
Emotional Voice Conversion [81.1492897350032]
Emotional Voice Conversion aims to manipulate a speech according to a given emotion while preserving non-emotion components.
We propose an Attention-based Interactive diseNtangling Network (AINN) that leverages instance-wise emotional knowledge for voice conversion.
arXiv Detail & Related papers (2023-12-29T08:06:45Z) - Mutilmodal Feature Extraction and Attention-based Fusion for Emotion
Estimation in Videos [16.28109151595872]
We introduce our submission to the CVPR 2023 Competition on Affective Behavior Analysis in-the-wild (ABAW)
We exploited multimodal features extracted from video of different lengths from the competition dataset, including audio, pose and images.
Our system achieves the performance of 0.361 on the validation dataset.
arXiv Detail & Related papers (2023-03-18T14:08:06Z) - A Dual Branch Network for Emotional Reaction Intensity Estimation [12.677143408225167]
We propose a solution to the ERI challenge of the fifth Affective Behavior Analysis in-the-wild(ABAW), a dual-branch based multi-output regression model.
The spatial attention is used to better extract visual features, and the Mel-Frequency Cepstral Coefficients technology extracts acoustic features.
Our method achieves excellent results on the official validation set.
arXiv Detail & Related papers (2023-03-16T10:31:40Z) - Multimodal Feature Extraction and Fusion for Emotional Reaction
Intensity Estimation and Expression Classification in Videos with
Transformers [47.16005553291036]
We present our solutions to the two sub-challenges of Affective Behavior Analysis in the wild (ABAW) 2023.
For the Expression Classification Challenge, we propose a streamlined approach that handles the challenges of classification effectively.
By studying, analyzing, and combining these features, we significantly enhance the model's accuracy for sentiment prediction in a multimodal context.
arXiv Detail & Related papers (2023-03-16T09:03:17Z) - Leveraging TCN and Transformer for effective visual-audio fusion in
continuous emotion recognition [0.5370906227996627]
We present our approach to the Valence-Arousal (VA) Estimation Challenge, Expression (Expr) Classification Challenge, and Action Unit (AU) Detection Challenge.
We propose a novel multi-modal fusion model that leverages Temporal Convolutional Networks (TCN) and Transformer to enhance the performance of continuous emotion recognition.
arXiv Detail & Related papers (2023-03-15T04:15:57Z) - ABAW: Valence-Arousal Estimation, Expression Recognition, Action Unit
Detection & Emotional Reaction Intensity Estimation Challenges [62.413819189049946]
5th Affective Behavior Analysis in-the-wild (ABAW) Competition is part of the respective ABAW Workshop which will be held in conjunction with IEEE Computer Vision and Pattern Recognition Conference (CVPR), 2023.
For this year's Competition, we feature two corpora: i) an extended version of the Aff-Wild2 database and ii) the Hume-Reaction dataset.
The latter dataset is an audiovisual one in which reactions of individuals to emotional stimuli have been annotated with respect to seven emotional expression intensities.
arXiv Detail & Related papers (2023-03-02T18:58:15Z) - Affective Image Content Analysis: Two Decades Review and New
Perspectives [132.889649256384]
We will comprehensively review the development of affective image content analysis (AICA) in the recent two decades.
We will focus on the state-of-the-art methods with respect to three main challenges -- the affective gap, perception subjectivity, and label noise and absence.
We discuss some challenges and promising research directions in the future, such as image content and context understanding, group emotion clustering, and viewer-image interaction.
arXiv Detail & Related papers (2021-06-30T15:20:56Z) - Exploring Emotion Features and Fusion Strategies for Audio-Video Emotion
Recognition [62.48806555665122]
We describe our approaches in EmotiW 2019, which mainly explores emotion features and feature fusion strategies for audio and visual modality.
With careful evaluation, we obtain 65.5% on the AFEW validation set and 62.48% on the test set and rank third in the challenge.
arXiv Detail & Related papers (2020-12-27T10:50:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.