Related papers: Affective Behaviour Analysis via Integrating Multi-Modal Knowledge

Affective Behaviour Analysis via Integrating Multi-Modal Knowledge

URL: http://arxiv.org/abs/2403.10825v1
Date: Sat, 16 Mar 2024 06:26:43 GMT
Title: Affective Behaviour Analysis via Integrating Multi-Modal Knowledge
Authors: Wei Zhang, Feng Qiu, Chen Liu, Lincheng Li, Heming Du, Tiancheng Guo, Xin Yu,
Abstract summary: The 6th competition on Affective Behavior Analysis in-the-wild (ABAW) utilizes the Aff-Wild2, Hume-Vidmimic2, and C-EXPR-DB datasets. We present our method designs for the five competitive tracks, i.e., Valence-Arousal (VA) Estimation, Expression (EXPR) Recognition, Action Unit (AU) Detection, Compound Expression (CE) Recognition, and Emotional Mimicry Intensity (EMI) Estimation.
Score: 24.74463315135503
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Affective Behavior Analysis aims to facilitate technology emotionally smart, creating a world where devices can understand and react to our emotions as humans do. To comprehensively evaluate the authenticity and applicability of emotional behavior analysis techniques in natural environments, the 6th competition on Affective Behavior Analysis in-the-wild (ABAW) utilizes the Aff-Wild2, Hume-Vidmimic2, and C-EXPR-DB datasets to set up five competitive tracks, i.e., Valence-Arousal (VA) Estimation, Expression (EXPR) Recognition, Action Unit (AU) Detection, Compound Expression (CE) Recognition, and Emotional Mimicry Intensity (EMI) Estimation. In this paper, we present our method designs for the five tasks. Specifically, our design mainly includes three aspects: 1) Utilizing a transformer-based feature fusion module to fully integrate emotional information provided by audio signals, visual images, and transcripts, offering high-quality expression features for the downstream tasks. 2) To achieve high-quality facial feature representations, we employ Masked-Auto Encoder as the visual features extraction model and fine-tune it with our facial dataset. 3) Considering the complexity of the video collection scenes, we conduct a more detailed dataset division based on scene characteristics and train the classifier for each scene. Extensive experiments demonstrate the superiority of our designs.

Related papers

Modelling Emotions in Face-to-Face Setting: The Interplay of Eye-Tracking, Personality, and Temporal Dynamics [1.4645774851707578]
In this study, we showcase how integrating eye-tracking data, temporal dynamics, and personality traits can substantially enhance the detection of both perceived and felt emotions. Our findings inform the design of future affective computing and human-agent systems.
arXiv Detail & Related papers (2025-03-18T13:15:32Z)
MAVEN: Multi-modal Attention for Valence-Arousal Emotion Network [6.304608172789466]
MAVEN is a novel architecture for dynamic emotion recognition through dimensional modeling of affect. Our approach employs modality-specific encoders to extract rich feature representations from synchronized video frames, audio segments, and transcripts. MAVEN predicts emotions in a polar coordinate form, aligning with psychological models of the emotion circumplex.
arXiv Detail & Related papers (2025-03-16T19:32:32Z)
HSEmotion Team at ABAW-8 Competition: Audiovisual Ambivalence/Hesitancy, Emotional Mimicry Intensity and Facial Expression Recognition [16.860963320038902]
This article presents our results for the eighth Affective Behavior Analysis in-the-Wild (ABAW) competition. We combine facial emotional descriptors extracted by pre-trained models with acoustic features and embeddings of texts recognized from speech. The video-level prediction of emotional mimicry intensity is implemented by simply aggregating frame-level features and training a multi-layered perceptron.
arXiv Detail & Related papers (2025-03-13T14:21:46Z)
Enriching Multimodal Sentiment Analysis through Textual Emotional Descriptions of Visual-Audio Content [56.62027582702816]
Multimodal Sentiment Analysis seeks to unravel human emotions by amalgamating text, audio, and visual data. Yet, discerning subtle emotional nuances within audio and video expressions poses a formidable challenge. We introduce DEVA, a progressive fusion framework founded on textual sentiment descriptions.
arXiv Detail & Related papers (2024-12-12T11:30:41Z)
Smile upon the Face but Sadness in the Eyes: Emotion Recognition based on Facial Expressions and Eye Behaviors [63.194053817609024]
We introduce eye behaviors as an important emotional cues for the creation of a new Eye-behavior-aided Multimodal Emotion Recognition dataset. For the first time, we provide annotations for both Emotion Recognition (ER) and Facial Expression Recognition (FER) in the EMER dataset. We specifically design a new EMERT architecture to concurrently enhance performance in both ER and FER.
arXiv Detail & Related papers (2024-11-08T04:53:55Z)
Affective Behaviour Analysis via Progressive Learning [23.455163723584427]
We present our methods and experimental results for the two competition tracks. We train a Masked-Auto in a self-supervised manner to attain high-quality facial features. We utilize curriculum learning to transition the model from recognizing single expressions to recognizing compound expressions.
arXiv Detail & Related papers (2024-07-24T02:24:21Z)
VLLMs Provide Better Context for Emotion Understanding Through Common Sense Reasoning [66.23296689828152]
We leverage the capabilities of Vision-and-Large-Language Models to enhance in-context emotion classification. In the first stage, we propose prompting VLLMs to generate descriptions in natural language of the subject's apparent emotion. In the second stage, the descriptions are used as contextual information and, along with the image input, are used to train a transformer-based architecture.
arXiv Detail & Related papers (2024-04-10T15:09:15Z)
Computer Vision Estimation of Emotion Reaction Intensity in the Wild [1.5481864635049696]
We describe our submission to the newly introduced Emotional Reaction Intensity (ERI) Estimation challenge. We developed four deep neural networks trained in the visual domain and a multimodal model trained with both visual and audio features to predict emotion reaction intensity.
arXiv Detail & Related papers (2023-03-19T19:09:41Z)
CIAO! A Contrastive Adaptation Mechanism for Non-Universal Facial Expression Recognition [80.07590100872548]
We propose Contrastive Inhibitory Adaptati On (CIAO), a mechanism that adapts the last layer of facial encoders to depict specific affective characteristics on different datasets. CIAO presents an improvement in facial expression recognition performance over six different datasets with very unique affective representations.
arXiv Detail & Related papers (2022-08-10T15:46:05Z)
An Ensemble Approach for Multiple Emotion Descriptors Estimation Using Multi-task Learning [12.589338141771385]
This paper illustrates our submission method to the fourth Affective Behavior Analysis in-the-Wild (ABAW) Competition. Instead of using only face information, we employ full information from a provided dataset containing face and the context around the face. The proposed system achieves the performance of 0.917 on the MTL Challenge validation dataset.
arXiv Detail & Related papers (2022-07-22T04:57:56Z)
SOLVER: Scene-Object Interrelated Visual Emotion Reasoning Network [83.27291945217424]
We propose a novel Scene-Object interreLated Visual Emotion Reasoning network (SOLVER) to predict emotions from images. To mine the emotional relationships between distinct objects, we first build up an Emotion Graph based on semantic concepts and visual features. We also design a Scene-Object Fusion Module to integrate scenes and objects, which exploits scene features to guide the fusion process of object features with the proposed scene-based attention mechanism.
arXiv Detail & Related papers (2021-10-24T02:41:41Z)
Stimuli-Aware Visual Emotion Analysis [75.68305830514007]
We propose a stimuli-aware visual emotion analysis (VEA) method consisting of three stages, namely stimuli selection, feature extraction and emotion prediction. To the best of our knowledge, it is the first time to introduce stimuli selection process into VEA in an end-to-end network. Experiments demonstrate that the proposed method consistently outperforms the state-of-the-art approaches on four public visual emotion datasets.
arXiv Detail & Related papers (2021-09-04T08:14:52Z)
Prior Aided Streaming Network for Multi-task Affective Recognitionat the 2nd ABAW2 Competition [9.188777864190204]
We introduce our submission to the 2nd Affective Behavior Analysis in-the-wild (ABAW2) Competition. In dealing with different emotion representations, we propose a multi-task streaming network. We leverage an advanced facial expression embedding as prior knowledge.
arXiv Detail & Related papers (2021-07-08T09:35:08Z)
Affect2MM: Affective Analysis of Multimedia Content Using Emotion Causality [84.69595956853908]
We present Affect2MM, a learning method for time-series emotion prediction for multimedia content. Our goal is to automatically capture the varying emotions depicted by characters in real-life human-centric situations and behaviors.
arXiv Detail & Related papers (2021-03-11T09:07:25Z)
A Multi-resolution Approach to Expression Recognition in the Wild [9.118706387430883]
We propose a multi-resolution approach to solve the Facial Expression Recognition task. We ground our intuition on the observation that often faces images are acquired at different resolutions. To our aim, we use a ResNet-like architecture, equipped with Squeeze-and-Excitation blocks, trained on the Affect-in-the-Wild 2 dataset.
arXiv Detail & Related papers (2021-03-09T21:21:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.