Related papers: Learning Unseen Emotions from Gestures via Semantically-Conditioned Zero-Shot Perception with Adversarial Autoencoders

Learning Unseen Emotions from Gestures via Semantically-Conditioned Zero-Shot Perception with Adversarial Autoencoders

URL: http://arxiv.org/abs/2009.08906v2
Date: Thu, 2 Dec 2021 08:16:02 GMT
Title: Learning Unseen Emotions from Gestures via Semantically-Conditioned Zero-Shot Perception with Adversarial Autoencoders
Authors: Abhishek Banerjee, Uttaran Bhattacharya, Aniket Bera
Abstract summary: We introduce an adversarial, autoencoder-based representation learning that correlates 3D motion-captured gesture sequence with the vectorized representation of the natural-language perceived emotion terms. We train our method using a combination of gestures annotated with known emotion terms and gestures not annotated with any emotions.
Score: 25.774235606472875
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a novel generalized zero-shot algorithm to recognize perceived emotions from gestures. Our task is to map gestures to novel emotion categories not encountered in training. We introduce an adversarial, autoencoder-based representation learning that correlates 3D motion-captured gesture sequence with the vectorized representation of the natural-language perceived emotion terms using word2vec embeddings. The language-semantic embedding provides a representation of the emotion label space, and we leverage this underlying distribution to map the gesture-sequences to the appropriate categorical emotion labels. We train our method using a combination of gestures annotated with known emotion terms and gestures not annotated with any emotions. We evaluate our method on the MPI Emotional Body Expressions Database (EBEDB) and obtain an accuracy of $58.43\%$. This improves the performance of current state-of-the-art algorithms for generalized zero-shot learning by $25$--$27\%$ on the absolute.

Related papers

From Coarse to Nuanced: Cross-Modal Alignment of Fine-Grained Linguistic Cues and Visual Salient Regions for Dynamic Emotion Recognition [7.362433184546492]
Dynamic Facial Expression Recognition aims to identify human emotions from temporally evolving facial movements.<n>Our method integrates dynamic motion modeling, semantic text refinement, and token-level cross-modal alignment to facilitate the precise localization of emotionally salient features.
arXiv Detail & Related papers (2025-07-16T04:15:06Z)
Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation [63.94836524433559]
DICE-Talk is a framework for disentangling identity with emotion and cooperating emotions with similar characteristics. We develop a disentangled emotion embedder that jointly models audio-visual emotional cues through cross-modal attention. Second, we introduce a correlation-enhanced emotion conditioning module with learnable Emotion Banks. Third, we design an emotion discrimination objective that enforces affective consistency during the diffusion process.
arXiv Detail & Related papers (2025-04-25T05:28:21Z)
Learning Emotion Representations from Verbal and Nonverbal Communication [7.747924294389427]
We present EmotionCLIP, the first pre-training paradigm to extract visual emotion representations from verbal and nonverbal communication. We guide EmotionCLIP to attend to nonverbal emotion cues through subject-aware context encoding and verbal emotion cues using sentiment-guided contrastive learning. EmotionCLIP will address the prevailing issue of data scarcity in emotion understanding, thereby fostering progress in related domains.
arXiv Detail & Related papers (2023-05-22T21:36:55Z)
MAFW: A Large-scale, Multi-modal, Compound Affective Database for Dynamic Facial Expression Recognition in the Wild [56.61912265155151]
We propose MAFW, a large-scale compound affective database with 10,045 video-audio clips in the wild. Each clip is annotated with a compound emotional category and a couple of sentences that describe the subjects' affective behaviors in the clip. For the compound emotion annotation, each clip is categorized into one or more of the 11 widely-used emotions, i.e., anger, disgust, fear, happiness, neutral, sadness, surprise, contempt, anxiety, helplessness, and disappointment.
arXiv Detail & Related papers (2022-08-01T13:34:33Z)
The Emotion is Not One-hot Encoding: Learning with Grayscale Label for Emotion Recognition in Conversation [0.0]
In emotion recognition in conversation (ERC), the emotion of the current utterance is predicted by considering the previous context. We introduce several methods for constructing grayscale labels and confirm that each method improves the emotion recognition performance.
arXiv Detail & Related papers (2022-06-15T08:14:42Z)
A Circular-Structured Representation for Visual Emotion Distribution Learning [82.89776298753661]
We propose a well-grounded circular-structured representation to utilize the prior knowledge for visual emotion distribution learning. To be specific, we first construct an Emotion Circle to unify any emotional state within it. On the proposed Emotion Circle, each emotion distribution is represented with an emotion vector, which is defined with three attributes.
arXiv Detail & Related papers (2021-06-23T14:53:27Z)
Enhancing Cognitive Models of Emotions with Representation Learning [58.2386408470585]
We present a novel deep learning-based framework to generate embedding representations of fine-grained emotions. Our framework integrates a contextualized embedding encoder with a multi-head probing model. Our model is evaluated on the Empathetic Dialogue dataset and shows the state-of-the-art result for classifying 32 emotions.
arXiv Detail & Related papers (2021-04-20T16:55:15Z)
A Generalized Zero-Shot Framework for Emotion Recognition from Body Gestures [5.331671302839567]
We introduce a Generalized Zero-Shot Learning (GZSL) framework to infer the emotional state of new body gestures. The framework is significantly superior to the traditional method of emotion classification and state-of-the-art zero-shot learning methods.
arXiv Detail & Related papers (2020-10-13T13:16:38Z)
Modality-Transferable Emotion Embeddings for Low-Resource Multimodal Emotion Recognition [55.44502358463217]
We propose a modality-transferable model with emotion embeddings to tackle the aforementioned issues. Our model achieves state-of-the-art performance on most of the emotion categories. Our model also outperforms existing baselines in the zero-shot and few-shot scenarios for unseen emotions.
arXiv Detail & Related papers (2020-09-21T06:10:39Z)
ProxEmo: Gait-based Emotion Learning and Multi-view Proxemic Fusion for Socially-Aware Robot Navigation [65.11858854040543]
We present ProxEmo, a novel end-to-end emotion prediction algorithm for robot navigation among pedestrians. Our approach predicts the perceived emotions of a pedestrian from walking gaits, which is then used for emotion-guided navigation.
arXiv Detail & Related papers (2020-03-02T17:47:49Z)
Take an Emotion Walk: Perceiving Emotions from Gaits Using Hierarchical Attention Pooling and Affective Mapping [55.72376663488104]
We present an autoencoder-based approach to classify perceived human emotions from walking styles obtained from videos or motion-captured data. Given the motion on each joint in the pose at each time step extracted from 3D pose sequences, we hierarchically pool these joint motions in the encoder. We train the decoder to reconstruct the motions per joint per time step in a top-down manner from the latent embeddings.
arXiv Detail & Related papers (2019-11-20T05:04:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.