Pose Uncertainty Aware Movement Synchrony Estimation via
Spatial-Temporal Graph Transformer
- URL: http://arxiv.org/abs/2208.01161v1
- Date: Mon, 1 Aug 2022 22:35:32 GMT
- Title: Pose Uncertainty Aware Movement Synchrony Estimation via
Spatial-Temporal Graph Transformer
- Authors: Jicheng Li, Anjana Bhat, Roghayeh Barmaki
- Abstract summary: Movement synchrony reflects the coordination of body movements between interacting dyads.
This paper proposes a skeleton-based graph transformer for movement synchrony estimation.
Our method achieved an overall accuracy of 88.98% and surpassed its counterparts by a wide margin.
- Score: 7.053333608725945
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Movement synchrony reflects the coordination of body movements between
interacting dyads. The estimation of movement synchrony has been automated by
powerful deep learning models such as transformer networks. However, instead of
designing a specialized network for movement synchrony estimation, previous
transformer-based works broadly adopted architectures from other tasks such as
human activity recognition. Therefore, this paper proposed a skeleton-based
graph transformer for movement synchrony estimation. The proposed model applied
ST-GCN, a spatial-temporal graph convolutional neural network for skeleton
feature extraction, followed by a spatial transformer for spatial feature
generation. The spatial transformer is guided by a uniquely designed joint
position embedding shared between the same joints of interacting individuals.
Besides, we incorporated a temporal similarity matrix in temporal attention
computation considering the periodic intrinsic of body movements. In addition,
the confidence score associated with each joint reflects the uncertainty of a
pose, while previous works on movement synchrony estimation have not
sufficiently emphasized this point. Since transformer networks demand a
significant amount of data to train, we constructed a dataset for movement
synchrony estimation using Human3.6M, a benchmark dataset for human activity
recognition, and pretrained our model on it using contrastive learning. We
further applied knowledge distillation to alleviate information loss introduced
by pose detector failure in a privacy-preserving way. We compared our method
with representative approaches on PT13, a dataset collected from autism therapy
interventions. Our method achieved an overall accuracy of 88.98% and surpassed
its counterparts by a wide margin while maintaining data privacy.
Related papers
- GTransPDM: A Graph-embedded Transformer with Positional Decoupling for Pedestrian Crossing Intention Prediction [6.327758022051579]
GTransPDM was developed for pedestrian crossing intention prediction by leveraging multi-modal features.
It achieves 92% accuracy on the PIE dataset and 87% accuracy on the JAAD dataset, with a processing speed of 0.05ms.
arXiv Detail & Related papers (2024-09-30T12:02:17Z) - ALERT-Transformer: Bridging Asynchronous and Synchronous Machine Learning for Real-Time Event-based Spatio-Temporal Data [8.660721666999718]
We propose a hybrid pipeline composed of asynchronous sensing and synchronous processing.
We achieve performances state-of-the-art with a lower latency than competitors.
arXiv Detail & Related papers (2024-02-02T13:17:19Z) - Spatio-temporal MLP-graph network for 3D human pose estimation [8.267311047244881]
Graph convolutional networks and their variants have shown significant promise in 3D human pose estimation.
We introduce a new weighted Jacobi feature rule obtained through graph filtering with implicit propagation fairing.
We also employ adjacency modulation with the aim of learning meaningful correlations beyond defined between body joints.
arXiv Detail & Related papers (2023-08-29T14:00:55Z) - Auxiliary Tasks Benefit 3D Skeleton-based Human Motion Prediction [106.06256351200068]
This paper introduces a model learning framework with auxiliary tasks.
In our auxiliary tasks, partial body joints' coordinates are corrupted by either masking or adding noise.
We propose a novel auxiliary-adapted transformer, which can handle incomplete, corrupted motion data.
arXiv Detail & Related papers (2023-08-17T12:26:11Z) - TransFusion: A Practical and Effective Transformer-based Diffusion Model
for 3D Human Motion Prediction [1.8923948104852863]
We propose TransFusion, an innovative and practical diffusion-based model for 3D human motion prediction.
Our model leverages Transformer as the backbone with long skip connections between shallow and deep layers.
In contrast to prior diffusion-based models that utilize extra modules like cross-attention and adaptive layer normalization, we treat all inputs, including conditions, as tokens to create a more lightweight model.
arXiv Detail & Related papers (2023-07-30T01:52:07Z) - Edge Continual Learning for Dynamic Digital Twins over Wireless Networks [68.65520952712914]
Digital twins (DTs) constitute a critical link between the real-world and the metaverse.
In this paper, a novel edge continual learning framework is proposed to accurately model the evolving affinity between a physical twin and its corresponding cyber twin.
The proposed framework achieves a simultaneously accurate and synchronous CT model that is robust to catastrophic forgetting.
arXiv Detail & Related papers (2022-04-10T23:25:37Z) - MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose
Estimation in Video [75.23812405203778]
Recent solutions have been introduced to estimate 3D human pose from 2D keypoint sequence by considering body joints among all frames globally to learn-temporal correlation.
We propose Mix Mix, which has temporal transformer block to separately model the temporal motion of each joint and a transformer block inter-joint spatial correlation.
In addition, the network output is extended from the central frame to entire frames of input video, improving the coherence between the input and output benchmarks.
arXiv Detail & Related papers (2022-03-02T04:20:59Z) - Learning Iterative Robust Transformation Synchronization [71.73273007900717]
We propose to use graph neural networks (GNNs) to learn transformation synchronization.
In this work, we avoid handcrafting robust loss functions, and propose to use graph neural networks (GNNs) to learn transformation synchronization.
arXiv Detail & Related papers (2021-11-01T07:03:14Z) - Domain Adaptive Robotic Gesture Recognition with Unsupervised
Kinematic-Visual Data Alignment [60.31418655784291]
We propose a novel unsupervised domain adaptation framework which can simultaneously transfer multi-modality knowledge, i.e., both kinematic and visual data, from simulator to real robot.
It remedies the domain gap with enhanced transferable features by using temporal cues in videos, and inherent correlations in multi-modal towards recognizing gesture.
Results show that our approach recovers the performance with great improvement gains, up to 12.91% in ACC and 20.16% in F1score without using any annotations in real robot.
arXiv Detail & Related papers (2021-03-06T09:10:03Z) - Skeleton-based Action Recognition via Spatial and Temporal Transformer
Networks [12.06555892772049]
We propose a novel Spatial-Temporal Transformer network (ST-TR) which models dependencies between joints using the Transformer self-attention operator.
The proposed ST-TR achieves state-of-the-art performance on all datasets when using joints' coordinates as input, and results on-par with state-of-the-art when adding bones information.
arXiv Detail & Related papers (2020-08-17T15:25:40Z) - End-to-end Contextual Perception and Prediction with Interaction
Transformer [79.14001602890417]
We tackle the problem of detecting objects in 3D and forecasting their future motion in the context of self-driving.
To capture their spatial-temporal dependencies, we propose a recurrent neural network with a novel Transformer architecture.
Our model can be trained end-to-end, and runs in real-time.
arXiv Detail & Related papers (2020-08-13T14:30:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.