TNTC: two-stream network with transformer-based complementarity for
gait-based emotion recognition
- URL: http://arxiv.org/abs/2110.13708v1
- Date: Tue, 26 Oct 2021 13:55:31 GMT
- Title: TNTC: two-stream network with transformer-based complementarity for
gait-based emotion recognition
- Authors: Chuanfei Hu, Weijie Sheng, Bo Dong, Xinde Li
- Abstract summary: gait-based emotion recognition, especially gait skeletons-based characteristic, has attracted much attention.
We propose a novel two-stream network with transformer-based complementarity, termed as TNTC.
A new transformer-based complementarity module (TCM) is proposed to bridge the complementarity between two streams hierarchically.
- Score: 4.9752798133038585
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recognizing the human emotion automatically from visual characteristics plays
a vital role in many intelligent applications. Recently, gait-based emotion
recognition, especially gait skeletons-based characteristic, has attracted much
attention, while many available methods have been proposed gradually. The
popular pipeline is to first extract affective features from joint skeletons,
and then aggregate the skeleton joint and affective features as the feature
vector for classifying the emotion. However, the aggregation procedure of these
emerged methods might be rigid, resulting in insufficiently exploiting the
complementary relationship between skeleton joint and affective features.
Meanwhile, the long range dependencies in both spatial and temporal domains of
the gait sequence are scarcely considered. To address these issues, we propose
a novel two-stream network with transformer-based complementarity, termed as
TNTC. Skeleton joint and affective features are encoded into two individual
images as the inputs of two streams, respectively. A new transformer-based
complementarity module (TCM) is proposed to bridge the complementarity between
two streams hierarchically via capturing long range dependencies. Experimental
results demonstrate TNTC outperforms state-of-the-art methods on the latest
dataset in terms of accuracy.
Related papers
- GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric Learning [51.677086019209554]
We propose a Generalized Structural Sparse to capture powerful relationships across modalities for pair-wise similarity learning.
The distance metric delicately encapsulates two formats of diagonal and block-diagonal terms.
Experiments on cross-modal and two extra uni-modal retrieval tasks have validated its superiority and flexibility.
arXiv Detail & Related papers (2024-10-20T03:45:50Z) - GaitMA: Pose-guided Multi-modal Feature Fusion for Gait Recognition [26.721242606715354]
Gait recognition is a biometric technology that recognizes the identity of humans through their walking patterns.
We propose a novel gait recognition framework, dubbed Gait Multi-model Aggregation Network (GaitMA)
First, skeletons are represented by joint/limb-based heatmaps, and features from silhouettes and skeletons are respectively extracted using two CNN-based feature extractors.
arXiv Detail & Related papers (2024-07-20T09:05:17Z) - Two in One Go: Single-stage Emotion Recognition with Decoupled Subject-context Transformer [78.35816158511523]
We present a single-stage emotion recognition approach, employing a Decoupled Subject-Context Transformer (DSCT) for simultaneous subject localization and emotion classification.
We evaluate our single-stage framework on two widely used context-aware emotion recognition datasets, CAER-S and EMOTIC.
arXiv Detail & Related papers (2024-04-26T07:30:32Z) - A Two-stream Hybrid CNN-Transformer Network for Skeleton-based Human
Interaction Recognition [6.490564374810672]
We propose a Two-stream Hybrid CNN-Transformer Network (THCT-Net)
It exploits the local specificity of CNN and models global dependencies through the Transformer.
We show that the proposed method can better comprehend and infer the meaning and context of various actions, outperforming state-of-the-art methods.
arXiv Detail & Related papers (2023-12-31T06:46:46Z) - Feature Decoupling-Recycling Network for Fast Interactive Segmentation [79.22497777645806]
Recent interactive segmentation methods iteratively take source image, user guidance and previously predicted mask as the input.
We propose the Feature Decoupling-Recycling Network (FDRN), which decouples the modeling components based on their intrinsic discrepancies.
arXiv Detail & Related papers (2023-08-07T12:26:34Z) - Skeleton-based Action Recognition through Contrasting Two-Stream
Spatial-Temporal Networks [11.66009967197084]
We propose a novel Contrastive GCN-Transformer Network (ConGT) which fuses the spatial and temporal modules in a parallel way.
We conduct experiments on three benchmark datasets, which demonstrate that our model achieves state-of-the-art performance in action recognition.
arXiv Detail & Related papers (2023-01-27T02:12:08Z) - Part-guided Relational Transformers for Fine-grained Visual Recognition [59.20531172172135]
We propose a framework to learn the discriminative part features and explore correlations with a feature transformation module.
Our proposed approach does not rely on additional part branches and reaches state-the-of-art performance on 3-of-the-level object recognition.
arXiv Detail & Related papers (2022-12-28T03:45:56Z) - A Hierarchical Interactive Network for Joint Span-based Aspect-Sentiment
Analysis [34.1489054082536]
We propose a hierarchical interactive network (HI-ASA) to model two-way interactions between two tasks appropriately.
We use cross-stitch mechanism to combine the different task-specific features selectively as the input to ensure proper two-way interactions.
Experiments on three real-world datasets demonstrate HI-ASA's superiority over baselines.
arXiv Detail & Related papers (2022-08-24T03:03:49Z) - Combining the Silhouette and Skeleton Data for Gait Recognition [13.345465199699]
Two dominant gait recognition works are appearance-based and model-based, which extract features from silhouettes and skeletons, respectively.
This paper proposes a CNN-based branch taking silhouettes as input and a GCN-based branch taking skeletons as input.
For better gait representation in the GCN-based branch, we present a fully connected graph convolution operator to integrate multi-scale graph convolutions.
arXiv Detail & Related papers (2022-02-22T03:21:51Z) - Joint-bone Fusion Graph Convolutional Network for Semi-supervised
Skeleton Action Recognition [65.78703941973183]
We propose a novel correlation-driven joint-bone fusion graph convolutional network (CD-JBF-GCN) as an encoder and use a pose prediction head as a decoder.
Specifically, the CD-JBF-GC can explore the motion transmission between the joint stream and the bone stream.
The pose prediction based auto-encoder in the self-supervised training stage allows the network to learn motion representation from unlabeled data.
arXiv Detail & Related papers (2022-02-08T16:03:15Z) - Cascaded Human-Object Interaction Recognition [175.60439054047043]
We introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding.
At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network.
With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding.
arXiv Detail & Related papers (2020-03-09T17:05:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.