Learning Unseen Emotions from Gestures via Semantically-Conditioned
Zero-Shot Perception with Adversarial Autoencoders
- URL: http://arxiv.org/abs/2009.08906v2
- Date: Thu, 2 Dec 2021 08:16:02 GMT
- Title: Learning Unseen Emotions from Gestures via Semantically-Conditioned
Zero-Shot Perception with Adversarial Autoencoders
- Authors: Abhishek Banerjee, Uttaran Bhattacharya, Aniket Bera
- Abstract summary: We introduce an adversarial, autoencoder-based representation learning that correlates 3D motion-captured gesture sequence with the vectorized representation of the natural-language perceived emotion terms.
We train our method using a combination of gestures annotated with known emotion terms and gestures not annotated with any emotions.
- Score: 25.774235606472875
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel generalized zero-shot algorithm to recognize perceived
emotions from gestures. Our task is to map gestures to novel emotion categories
not encountered in training. We introduce an adversarial, autoencoder-based
representation learning that correlates 3D motion-captured gesture sequence
with the vectorized representation of the natural-language perceived emotion
terms using word2vec embeddings. The language-semantic embedding provides a
representation of the emotion label space, and we leverage this underlying
distribution to map the gesture-sequences to the appropriate categorical
emotion labels. We train our method using a combination of gestures annotated
with known emotion terms and gestures not annotated with any emotions. We
evaluate our method on the MPI Emotional Body Expressions Database (EBEDB) and
obtain an accuracy of $58.43\%$. This improves the performance of current
state-of-the-art algorithms for generalized zero-shot learning by $25$--$27\%$
on the absolute.
Related papers
- Learning Emotion Representations from Verbal and Nonverbal Communication [7.747924294389427]
We present EmotionCLIP, the first pre-training paradigm to extract visual emotion representations from verbal and nonverbal communication.
We guide EmotionCLIP to attend to nonverbal emotion cues through subject-aware context encoding and verbal emotion cues using sentiment-guided contrastive learning.
EmotionCLIP will address the prevailing issue of data scarcity in emotion understanding, thereby fostering progress in related domains.
arXiv Detail & Related papers (2023-05-22T21:36:55Z) - MAFW: A Large-scale, Multi-modal, Compound Affective Database for
Dynamic Facial Expression Recognition in the Wild [56.61912265155151]
We propose MAFW, a large-scale compound affective database with 10,045 video-audio clips in the wild.
Each clip is annotated with a compound emotional category and a couple of sentences that describe the subjects' affective behaviors in the clip.
For the compound emotion annotation, each clip is categorized into one or more of the 11 widely-used emotions, i.e., anger, disgust, fear, happiness, neutral, sadness, surprise, contempt, anxiety, helplessness, and disappointment.
arXiv Detail & Related papers (2022-08-01T13:34:33Z) - The Emotion is Not One-hot Encoding: Learning with Grayscale Label for
Emotion Recognition in Conversation [0.0]
In emotion recognition in conversation (ERC), the emotion of the current utterance is predicted by considering the previous context.
We introduce several methods for constructing grayscale labels and confirm that each method improves the emotion recognition performance.
arXiv Detail & Related papers (2022-06-15T08:14:42Z) - A Circular-Structured Representation for Visual Emotion Distribution
Learning [82.89776298753661]
We propose a well-grounded circular-structured representation to utilize the prior knowledge for visual emotion distribution learning.
To be specific, we first construct an Emotion Circle to unify any emotional state within it.
On the proposed Emotion Circle, each emotion distribution is represented with an emotion vector, which is defined with three attributes.
arXiv Detail & Related papers (2021-06-23T14:53:27Z) - Enhancing Cognitive Models of Emotions with Representation Learning [58.2386408470585]
We present a novel deep learning-based framework to generate embedding representations of fine-grained emotions.
Our framework integrates a contextualized embedding encoder with a multi-head probing model.
Our model is evaluated on the Empathetic Dialogue dataset and shows the state-of-the-art result for classifying 32 emotions.
arXiv Detail & Related papers (2021-04-20T16:55:15Z) - A Generalized Zero-Shot Framework for Emotion Recognition from Body
Gestures [5.331671302839567]
We introduce a Generalized Zero-Shot Learning (GZSL) framework to infer the emotional state of new body gestures.
The framework is significantly superior to the traditional method of emotion classification and state-of-the-art zero-shot learning methods.
arXiv Detail & Related papers (2020-10-13T13:16:38Z) - Modality-Transferable Emotion Embeddings for Low-Resource Multimodal
Emotion Recognition [55.44502358463217]
We propose a modality-transferable model with emotion embeddings to tackle the aforementioned issues.
Our model achieves state-of-the-art performance on most of the emotion categories.
Our model also outperforms existing baselines in the zero-shot and few-shot scenarios for unseen emotions.
arXiv Detail & Related papers (2020-09-21T06:10:39Z) - ProxEmo: Gait-based Emotion Learning and Multi-view Proxemic Fusion for
Socially-Aware Robot Navigation [65.11858854040543]
We present ProxEmo, a novel end-to-end emotion prediction algorithm for robot navigation among pedestrians.
Our approach predicts the perceived emotions of a pedestrian from walking gaits, which is then used for emotion-guided navigation.
arXiv Detail & Related papers (2020-03-02T17:47:49Z) - Take an Emotion Walk: Perceiving Emotions from Gaits Using Hierarchical Attention Pooling and Affective Mapping [55.72376663488104]
We present an autoencoder-based approach to classify perceived human emotions from walking styles obtained from videos or motion-captured data.
Given the motion on each joint in the pose at each time step extracted from 3D pose sequences, we hierarchically pool these joint motions in the encoder.
We train the decoder to reconstruct the motions per joint per time step in a top-down manner from the latent embeddings.
arXiv Detail & Related papers (2019-11-20T05:04:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.