STEP: Simultaneous Tracking and Estimation of Pose for Animals and Humans
- URL: http://arxiv.org/abs/2503.13344v2
- Date: Thu, 20 Mar 2025 10:11:27 GMT
- Title: STEP: Simultaneous Tracking and Estimation of Pose for Animals and Humans
- Authors: Shashikant Verma, Harish Katti, Soumyaratna Debnath, Yamuna Swamy, Shanmuganathan Raman,
- Abstract summary: We introduce STEP, a novel framework utilizing Transformer-based discriminative prediction model for simultaneous tracking and estimation of pose across diverse animal species and humans.<n>Our approach doesn't rely on per-frame target detections due to its tracking capability.<n>Our experiments demonstrate superior results compared to existing methods, opening doors to various applications.
- Score: 14.144097766150395
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce STEP, a novel framework utilizing Transformer-based discriminative model prediction for simultaneous tracking and estimation of pose across diverse animal species and humans. We are inspired by the fact that the human brain exploits spatiotemporal continuity and performs concurrent localization and pose estimation despite the specialization of brain areas for form and motion processing. Traditional discriminative models typically require predefined target states for determining model weights, a challenge we address through Gaussian Map Soft Prediction (GMSP) and Offset Map Regression Adapter (OMRA) Modules. These modules remove the necessity of keypoint target states as input, streamlining the process. Our method starts with a known target state in the initial frame of a given video sequence. It then seamlessly tracks the target and estimates keypoints of anatomical importance as output for subsequent frames. Unlike prevalent top-down pose estimation methods, our approach doesn't rely on per-frame target detections due to its tracking capability. This facilitates a significant advancement in inference efficiency and potential applications. We train and validate our approach on datasets encompassing diverse species. Our experiments demonstrate superior results compared to existing methods, opening doors to various applications, including but not limited to action recognition and behavioral analysis.
Related papers
- Unified Human Localization and Trajectory Prediction with Monocular Vision [64.19384064365431]
MonoTransmotion is a Transformer-based framework that uses only a monocular camera to jointly solve localization and prediction tasks.<n>We show that by jointly training both tasks with our unified framework, our method is more robust in real-world scenarios made of noisy inputs.
arXiv Detail & Related papers (2025-03-05T14:18:39Z) - Uncovering the human motion pattern: Pattern Memory-based Diffusion
Model for Trajectory Prediction [45.77348842004666]
Motion Pattern Priors Memory Network is a memory-based method to uncover latent motion patterns in human behavior.
We introduce an addressing mechanism to retrieve the matched pattern and the potential target distributions for each prediction from the memory bank.
Experiments validate the effectiveness of our approach, achieving state-of-the-art trajectory prediction accuracy.
arXiv Detail & Related papers (2024-01-05T17:39:52Z) - STGlow: A Flow-based Generative Framework with Dual Graphormer for
Pedestrian Trajectory Prediction [22.553356096143734]
We propose a novel generative flow based framework with dual graphormer for pedestrian trajectory prediction (STGlow)
Our method can more precisely model the underlying data distribution by optimizing the exact log-likelihood of motion behaviors.
Experimental results on several benchmarks demonstrate that our method achieves much better performance compared to previous state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-21T07:29:24Z) - Koopman pose predictions for temporally consistent human walking
estimations [11.016730029019522]
We introduce a new factor graph factor based on Koopman theory that embeds the nonlinear dynamics of lower-limb movement activities.
We show that our approach reduces outliers on the skeleton form by almost 1 m, while preserving natural walking trajectories at depths up to more than 10 m.
arXiv Detail & Related papers (2022-05-05T16:16:06Z) - Stochastic Trajectory Prediction via Motion Indeterminacy Diffusion [88.45326906116165]
We present a new framework to formulate the trajectory prediction task as a reverse process of motion indeterminacy diffusion (MID)
We encode the history behavior information and the social interactions as a state embedding and devise a Transformer-based diffusion model to capture the temporal dependencies of trajectories.
Experiments on the human trajectory prediction benchmarks including the Stanford Drone and ETH/UCY datasets demonstrate the superiority of our method.
arXiv Detail & Related papers (2022-03-25T16:59:08Z) - Model-based gait recognition using graph network on very large
population database [3.8707695363745223]
In this paper, to resist the increase of subjects and views variation, local features are built and a siamese network is proposed.
Experiments on the very large population dataset named OUM-Pose and the popular dataset, CASIA-B, show that our method archives some state-of-the-art (SOTA) performances in model-based gait recognition.
arXiv Detail & Related papers (2021-12-20T02:28:02Z) - STAR: Sparse Transformer-based Action Recognition [61.490243467748314]
This work proposes a novel skeleton-based human action recognition model with sparse attention on the spatial dimension and segmented linear attention on the temporal dimension of data.
Experiments show that our model can achieve comparable performance while utilizing much less trainable parameters and achieve high speed in training and inference.
arXiv Detail & Related papers (2021-07-15T02:53:11Z) - Learning Dynamics via Graph Neural Networks for Human Pose Estimation
and Tracking [98.91894395941766]
We propose a novel online approach to learning the pose dynamics, which are independent of pose detections in current fame.
Specifically, we derive this prediction of dynamics through a graph neural network(GNN) that explicitly accounts for both spatial-temporal and visual information.
Experiments on PoseTrack 2017 and PoseTrack 2018 datasets demonstrate that the proposed method achieves results superior to the state of the art on both human pose estimation and tracking tasks.
arXiv Detail & Related papers (2021-06-07T16:36:50Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z) - TraND: Transferable Neighborhood Discovery for Unsupervised Cross-domain
Gait Recognition [77.77786072373942]
This paper proposes a Transferable Neighborhood Discovery (TraND) framework to bridge the domain gap for unsupervised cross-domain gait recognition.
We design an end-to-end trainable approach to automatically discover the confident neighborhoods of unlabeled samples in the latent space.
Our method achieves state-of-the-art results on two public datasets, i.e., CASIA-B and OU-LP.
arXiv Detail & Related papers (2021-02-09T03:07:07Z) - Motion Prediction Using Temporal Inception Module [96.76721173517895]
We propose a Temporal Inception Module (TIM) to encode human motion.
Our framework produces input embeddings using convolutional layers, by using different kernel sizes for different input lengths.
The experimental results on standard motion prediction benchmark datasets Human3.6M and CMU motion capture dataset show that our approach consistently outperforms the state of the art methods.
arXiv Detail & Related papers (2020-10-06T20:26:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.