Take an Emotion Walk: Perceiving Emotions from Gaits Using Hierarchical Attention Pooling and Affective Mapping
- URL: http://arxiv.org/abs/1911.08708v5
- Date: Fri, 22 Nov 2024 22:51:03 GMT
- Title: Take an Emotion Walk: Perceiving Emotions from Gaits Using Hierarchical Attention Pooling and Affective Mapping
- Authors: Uttaran Bhattacharya, Christian Roncal, Trisha Mittal, Rohan Chandra, Kyra Kapsaskis, Kurt Gray, Aniket Bera, Dinesh Manocha,
- Abstract summary: We present an autoencoder-based approach to classify perceived human emotions from walking styles obtained from videos or motion-captured data.
Given the motion on each joint in the pose at each time step extracted from 3D pose sequences, we hierarchically pool these joint motions in the encoder.
We train the decoder to reconstruct the motions per joint per time step in a top-down manner from the latent embeddings.
- Score: 55.72376663488104
- License:
- Abstract: We present an autoencoder-based semi-supervised approach to classify perceived human emotions from walking styles obtained from videos or motion-captured data and represented as sequences of 3D poses. Given the motion on each joint in the pose at each time step extracted from 3D pose sequences, we hierarchically pool these joint motions in a bottom-up manner in the encoder, following the kinematic chains in the human body. We also constrain the latent embeddings of the encoder to contain the space of psychologically-motivated affective features underlying the gaits. We train the decoder to reconstruct the motions per joint per time step in a top-down manner from the latent embeddings. For the annotated data, we also train a classifier to map the latent embeddings to emotion labels. Our semi-supervised approach achieves a mean average precision of 0.84 on the Emotion-Gait benchmark dataset, which contains both labeled and unlabeled gaits collected from multiple sources. We outperform current state-of-art algorithms for both emotion recognition and action recognition from 3D gaits by 7%--23% on the absolute. More importantly, we improve the average precision by 10%--50% on the absolute on classes that each makes up less than 25% of the labeled part of the Emotion-Gait benchmark dataset.
Related papers
- HighlightMe: Detecting Highlights from Human-Centric Videos [52.84233165201391]
We present a domain- and user-preference-agnostic approach to detect highlightable excerpts from human-centric videos.
We use an autoencoder network equipped with spatial-temporal graph convolutions to detect human activities and interactions.
We observe a 4-12% improvement in the mean average precision of matching the human-annotated highlights over state-of-the-art methods.
arXiv Detail & Related papers (2021-10-05T01:18:15Z) - Speech2AffectiveGestures: Synthesizing Co-Speech Gestures with Generative Adversarial Affective Expression Learning [52.73083137245969]
We present a generative adversarial network to synthesize 3D pose sequences of co-speech upper-body gestures with appropriate affective expressions.
Our network consists of two components: a generator to synthesize gestures from a joint embedding space of features encoded from the input speech and the seed poses, and a discriminator to distinguish between the synthesized pose sequences and real 3D pose sequences.
arXiv Detail & Related papers (2021-07-31T15:13:39Z) - Affect2MM: Affective Analysis of Multimedia Content Using Emotion
Causality [84.69595956853908]
We present Affect2MM, a learning method for time-series emotion prediction for multimedia content.
Our goal is to automatically capture the varying emotions depicted by characters in real-life human-centric situations and behaviors.
arXiv Detail & Related papers (2021-03-11T09:07:25Z) - Recovering Trajectories of Unmarked Joints in 3D Human Actions Using
Latent Space Optimization [16.914342116747825]
Motion capture (mocap) and time-of-flight based sensing of human actions are becoming increasingly popular modalities to perform robust activity analysis.
However, there are several practical challenges in both modalities such as visibility, tracking errors, and simply the need to keep marker setup convenient.
This paper addresses the problem of reconstructing the unmarked joint data as an ill-posed linear inverse problem.
Experiments on both mocap and Kinect datasets clearly demonstrate that the proposed method performs very well in recovering semantics of the actions and dynamics of missing joints.
arXiv Detail & Related papers (2020-12-03T16:25:07Z) - Depth-Aware Action Recognition: Pose-Motion Encoding through Temporal
Heatmaps [2.2079886535603084]
We propose a depth-aware descriptor that encodes pose and motion information in a unified representation for action classification in-the-wild.
The key component of our method is the Depth-Aware Pose Motion representation (DA-PoTion), a new video descriptor that encodes the 3D movement of semantic keypoints of the human body.
arXiv Detail & Related papers (2020-11-26T17:26:42Z) - Learning Unseen Emotions from Gestures via Semantically-Conditioned
Zero-Shot Perception with Adversarial Autoencoders [25.774235606472875]
We introduce an adversarial, autoencoder-based representation learning that correlates 3D motion-captured gesture sequence with the vectorized representation of the natural-language perceived emotion terms.
We train our method using a combination of gestures annotated with known emotion terms and gestures not annotated with any emotions.
arXiv Detail & Related papers (2020-09-18T15:59:44Z) - EmotiCon: Context-Aware Multimodal Emotion Recognition using Frege's
Principle [71.47160118286226]
We present EmotiCon, a learning-based algorithm for context-aware perceived human emotion recognition from videos and images.
Motivated by Frege's Context Principle from psychology, our approach combines three interpretations of context for emotion recognition.
We report an Average Precision (AP) score of 35.48 across 26 classes, which is an improvement of 7-8 over prior methods.
arXiv Detail & Related papers (2020-03-14T19:55:21Z) - ProxEmo: Gait-based Emotion Learning and Multi-view Proxemic Fusion for
Socially-Aware Robot Navigation [65.11858854040543]
We present ProxEmo, a novel end-to-end emotion prediction algorithm for robot navigation among pedestrians.
Our approach predicts the perceived emotions of a pedestrian from walking gaits, which is then used for emotion-guided navigation.
arXiv Detail & Related papers (2020-03-02T17:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.