STEP: Spatial Temporal Graph Convolutional Networks for Emotion Perception from Gaits
- URL: http://arxiv.org/abs/1910.12906v3
- Date: Fri, 22 Nov 2024 22:59:33 GMT
- Title: STEP: Spatial Temporal Graph Convolutional Networks for Emotion Perception from Gaits
- Authors: Uttaran Bhattacharya, Trisha Mittal, Rohan Chandra, Tanmay Randhavane, Aniket Bera, Dinesh Manocha,
- Abstract summary: We present a novel network called STEP, to classify perceived human emotion from gaits.
We use hundreds of annotated real-world gait videos and augment them with thousands of annotated synthetic gaits.
STEP can learn the affective features and exhibits classification accuracy of 89% on E-Gait, which is 14 - 30% more accurate over prior methods.
- Score: 60.37683428887577
- License:
- Abstract: We present a novel classifier network called STEP, to classify perceived human emotion from gaits, based on a Spatial Temporal Graph Convolutional Network (ST-GCN) architecture. Given an RGB video of an individual walking, our formulation implicitly exploits the gait features to classify the emotional state of the human into one of four emotions: happy, sad, angry, or neutral. We use hundreds of annotated real-world gait videos and augment them with thousands of annotated synthetic gaits generated using a novel generative network called STEP-Gen, built on an ST-GCN based Conditional Variational Autoencoder (CVAE). We incorporate a novel push-pull regularization loss in the CVAE formulation of STEP-Gen to generate realistic gaits and improve the classification accuracy of STEP. We also release a novel dataset (E-Gait), which consists of $2,177$ human gaits annotated with perceived emotions along with thousands of synthetic gaits. In practice, STEP can learn the affective features and exhibits classification accuracy of 89% on E-Gait, which is 14 - 30% more accurate over prior methods.
Related papers
- LLM-Based Multi-Agent Systems are Scalable Graph Generative Models [73.28294528654885]
GraphAgent-Generator (GAG) is a novel simulation-based framework for dynamic, text-attributed social graph generation.
GAG simulates the temporal node and edge generation processes for zero-shot social graph generation.
The resulting graphs exhibit adherence to seven key macroscopic network properties, achieving an 11% improvement in microscopic graph structure metrics.
arXiv Detail & Related papers (2024-10-13T12:57:08Z) - EmT: A Novel Transformer for Generalized Cross-subject EEG Emotion Recognition [11.027908624804535]
We introduce a novel transformer model called emotion transformer (EmT)
EmT is designed to excel in both generalized cross-subject EEG emotion classification and regression tasks.
Experiments on four publicly available datasets show that EmT achieves higher results than the baseline methods for both EEG emotion classification and regression tasks.
arXiv Detail & Related papers (2024-06-26T13:42:11Z) - S^2Former-OR: Single-Stage Bi-Modal Transformer for Scene Graph Generation in OR [50.435592120607815]
Scene graph generation (SGG) of surgical procedures is crucial in enhancing holistically cognitive intelligence in the operating room (OR)
Previous works have primarily relied on multi-stage learning, where the generated semantic scene graphs depend on intermediate processes with pose estimation and object detection.
In this study, we introduce a novel single-stage bi-modal transformer framework for SGG in the OR, termed S2Former-OR.
arXiv Detail & Related papers (2024-02-22T11:40:49Z) - HybridGait: A Benchmark for Spatial-Temporal Cloth-Changing Gait
Recognition with Hybrid Explorations [66.5809637340079]
We propose the first in-the-wild benchmark CCGait for cloth-changing gait recognition.
We exploit both temporal dynamics and the projected 2D information of 3D human meshes.
Our contributions are twofold: we provide a challenging benchmark CCGait that captures realistic appearance changes across an expanded and space.
arXiv Detail & Related papers (2023-12-30T16:12:13Z) - Transforming Visual Scene Graphs to Image Captions [69.13204024990672]
We propose to transform Scene Graphs (TSG) into more descriptive captions.
In TSG, we apply multi-head attention (MHA) to design the Graph Neural Network (GNN) for embedding scene graphs.
In TSG, each expert is built on MHA, for discriminating the graph embeddings to generate different kinds of words.
arXiv Detail & Related papers (2023-05-03T15:18:37Z) - GN-Transformer: Fusing Sequence and Graph Representation for Improved
Code Summarization [0.0]
We propose a novel method, GN-Transformer, to learn end-to-end on a fused sequence and graph modality.
The proposed methods achieve state-of-the-art performance in two code summarization datasets and across three automatic code summarization metrics.
arXiv Detail & Related papers (2021-11-17T02:51:37Z) - Generative Adversarial Imitation Learning for Empathy-based AI [0.0]
Generative adversarial imitation learning (GAIL) is a model-free algorithm that has been shown to provide strong results in imitating complex behaviors in high-dimensional environments.
In this paper, we utilize the GAIL model for text generation to develop empathy-based context-aware conversational AI.
arXiv Detail & Related papers (2021-05-27T17:37:37Z) - ConsNet: Learning Consistency Graph for Zero-Shot Human-Object
Interaction Detection [101.56529337489417]
We consider the problem of Human-Object Interaction (HOI) Detection, which aims to locate and recognize HOI instances in the form of human, action, object> in images.
We argue that multi-level consistencies among objects, actions and interactions are strong cues for generating semantic representations of rare or previously unseen HOIs.
Our model takes visual features of candidate human-object pairs and word embeddings of HOI labels as inputs, maps them into visual-semantic joint embedding space and obtains detection results by measuring their similarities.
arXiv Detail & Related papers (2020-08-14T09:11:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.