Related papers: How Do You Perceive My Face? Recognizing Facial Expressions in Multi-Modal Context by Modeling Mental Representations

How Do You Perceive My Face? Recognizing Facial Expressions in Multi-Modal Context by Modeling Mental Representations

URL: http://arxiv.org/abs/2409.02566v1
Date: Wed, 4 Sep 2024 09:32:40 GMT
Title: How Do You Perceive My Face? Recognizing Facial Expressions in Multi-Modal Context by Modeling Mental Representations
Authors: Florian Blume, Runfeng Qu, Pia Bideau, Martin Maier, Rasha Abdel Rahman, Olaf Hellwich,
Abstract summary: We introduce a novel approach for facial expression classification that goes beyond simple classification tasks. Our model accurately classifies a perceived face and synthesizes the corresponding mental representation perceived by a human when observing a face in context. We evaluate synthesized expressions in a human study, showing that our model effectively produces approximations of human mental representations.
Score: 5.895694050664867
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Facial expression perception in humans inherently relies on prior knowledge and contextual cues, contributing to efficient and flexible processing. For instance, multi-modal emotional context (such as voice color, affective text, body pose, etc.) can prompt people to perceive emotional expressions in objectively neutral faces. Drawing inspiration from this, we introduce a novel approach for facial expression classification that goes beyond simple classification tasks. Our model accurately classifies a perceived face and synthesizes the corresponding mental representation perceived by a human when observing a face in context. With this, our model offers visual insights into its internal decision-making process. We achieve this by learning two independent representations of content and context using a VAE-GAN architecture. Subsequently, we propose a novel attention mechanism for context-dependent feature adaptation. The adapted representation is used for classification and to generate a context-augmented expression. We evaluate synthesized expressions in a human study, showing that our model effectively produces approximations of human mental representations. We achieve State-of-the-Art classification accuracies of 81.01% on the RAVDESS dataset and 79.34% on the MEAD dataset. We make our code publicly available.

Related papers

Exploring Cognitive and Aesthetic Causality for Multimodal Aspect-Based Sentiment Analysis [34.100793905255955]
Multimodal aspect-based sentiment classification (MASC) is an emerging task due to an increase in user-generated multimodal content on social platforms. Despite extensive efforts and significant achievements in existing MASC, substantial gaps remain in understanding fine-grained visual content. We present Chimera: a cognitive and aesthetic sentiment causality understanding framework to derive fine-grained holistic features of aspects.
arXiv Detail & Related papers (2025-04-22T12:43:37Z)
Probing the contents of semantic representations from text, behavior, and brain data using the psychNorms metabase [0.0]
We evaluate the similarities and differences between semantic representations derived from text, behavior, and brain data. We establish behavior as an important complement to text for capturing human representations and behavior.
arXiv Detail & Related papers (2024-12-06T10:44:20Z)
Knowledge-Enhanced Facial Expression Recognition with Emotional-to-Neutral Transformation [66.53435569574135]
Existing facial expression recognition methods typically fine-tune a pre-trained visual encoder using discrete labels. We observe that the rich knowledge in text embeddings, generated by vision-language models, is a promising alternative for learning discriminative facial expression representations. We propose a novel knowledge-enhanced FER method with an emotional-to-neutral transformation.
arXiv Detail & Related papers (2024-09-13T07:28:57Z)
ST-Gait++: Leveraging spatio-temporal convolutions for gait-based emotion recognition on videos [3.1489012476109854]
We propose a framework for emotion recognition through the analysis of gait. Our model is composed of a sequence of spatial-temporal Graph Convolutional Networks. We evaluate our proposed framework on the E-Gait dataset, composed of a total of 2177 samples.
arXiv Detail & Related papers (2024-05-22T18:24:21Z)
Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling [50.99252242917458]
Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting. To address the issue of data scarcity, we meticulously create emotional labels in terms of category and intensity. Our model outperforms the baseline models in understanding and rendering emotions.
arXiv Detail & Related papers (2023-12-19T08:47:50Z)
Emotion Recognition for Challenged People Facial Appearance in Social using Neural Network [0.0]
Face expression is used in CNN to categorize the acquired picture into different emotion categories. This paper proposes an idea for face and enlightenment invariant credit of facial expressions by the images.
arXiv Detail & Related papers (2023-05-11T14:38:27Z)
Contextually-rich human affect perception using multimodal scene information [36.042369831043686]
We leverage pretrained vision-language (VLN) models to extract descriptions of foreground context from images. We propose a multimodal context fusion (MCF) module to combine foreground cues with the visual scene and person-based contextual information for emotion prediction. We show the effectiveness of our proposed modular design on two datasets associated with natural scenes and TV shows.
arXiv Detail & Related papers (2023-03-13T07:46:41Z)
CIAO! A Contrastive Adaptation Mechanism for Non-Universal Facial Expression Recognition [80.07590100872548]
We propose Contrastive Inhibitory Adaptati On (CIAO), a mechanism that adapts the last layer of facial encoders to depict specific affective characteristics on different datasets. CIAO presents an improvement in facial expression recognition performance over six different datasets with very unique affective representations.
arXiv Detail & Related papers (2022-08-10T15:46:05Z)
Seeking Subjectivity in Visual Emotion Distribution Learning [93.96205258496697]
Visual Emotion Analysis (VEA) aims to predict people's emotions towards different visual stimuli. Existing methods often predict visual emotion distribution in a unified network, neglecting the inherent subjectivity in its crowd voting process. We propose a novel textitSubjectivity Appraise-and-Match Network (SAMNet) to investigate the subjectivity in visual emotion distribution.
arXiv Detail & Related papers (2022-07-25T02:20:03Z)
Enhancing Cognitive Models of Emotions with Representation Learning [58.2386408470585]
We present a novel deep learning-based framework to generate embedding representations of fine-grained emotions. Our framework integrates a contextualized embedding encoder with a multi-head probing model. Our model is evaluated on the Empathetic Dialogue dataset and shows the state-of-the-art result for classifying 32 emotions.
arXiv Detail & Related papers (2021-04-20T16:55:15Z)
A Multi-resolution Approach to Expression Recognition in the Wild [9.118706387430883]
We propose a multi-resolution approach to solve the Facial Expression Recognition task. We ground our intuition on the observation that often faces images are acquired at different resolutions. To our aim, we use a ResNet-like architecture, equipped with Squeeze-and-Excitation blocks, trained on the Affect-in-the-Wild 2 dataset.
arXiv Detail & Related papers (2021-03-09T21:21:02Z)
Facial Expression Editing with Continuous Emotion Labels [76.36392210528105]
Deep generative models have achieved impressive results in the field of automated facial expression editing. We propose a model that can be used to manipulate facial expressions in facial images according to continuous two-dimensional emotion labels.
arXiv Detail & Related papers (2020-06-22T13:03:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.