TEMPO: Efficient Multi-View Pose Estimation, Tracking, and Forecasting
- URL: http://arxiv.org/abs/2309.07910v1
- Date: Thu, 14 Sep 2023 17:56:30 GMT
- Title: TEMPO: Efficient Multi-View Pose Estimation, Tracking, and Forecasting
- Authors: Rohan Choudhury, Kris Kitani, Laszlo A. Jeni
- Abstract summary: We present an efficient multi-view pose estimation model that learns a robust temporal representation.
Our model is able to generalize across datasets without fine-tuning.
- Score: 27.3359362364858
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing volumetric methods for predicting 3D human pose estimation are
accurate, but computationally expensive and optimized for single time-step
prediction. We present TEMPO, an efficient multi-view pose estimation model
that learns a robust spatiotemporal representation, improving pose accuracy
while also tracking and forecasting human pose. We significantly reduce
computation compared to the state-of-the-art by recurrently computing
per-person 2D pose features, fusing both spatial and temporal information into
a single representation. In doing so, our model is able to use spatiotemporal
context to predict more accurate human poses without sacrificing efficiency. We
further use this representation to track human poses over time as well as
predict future poses. Finally, we demonstrate that our model is able to
generalize across datasets without scene-specific fine-tuning. TEMPO achieves
10$\%$ better MPJPE with a 33$\times$ improvement in FPS compared to TesseTrack
on the challenging CMU Panoptic Studio dataset.
Related papers
- AnyPose: Anytime 3D Human Pose Forecasting via Neural Ordinary
Differential Equations [2.7195102129095003]
AnyPose is a lightweight continuous-time neural architecture that models human behavior dynamics with neural ordinary differential equations.
Our results demonstrate that AnyPose exhibits high-performance accuracy in predicting future poses and takes significantly lower computational time than traditional methods.
arXiv Detail & Related papers (2023-09-09T16:59:57Z) - Self-supervised 3D Human Pose Estimation from a Single Image [1.0878040851638]
We propose a new self-supervised method for predicting 3D human body pose from a single image.
The prediction network is trained from a dataset of unlabelled images depicting people in typical poses and a set of unpaired 2D poses.
arXiv Detail & Related papers (2023-04-05T10:26:21Z) - Live Stream Temporally Embedded 3D Human Body Pose and Shape Estimation [13.40702053084305]
We present a temporally embedded 3D human body pose and shape estimation (TePose) method to improve the accuracy and temporal consistency pose in live stream videos.
A multi-scale convolutional network is presented as the motion discriminator for adversarial training using datasets without any 3D labeling.
arXiv Detail & Related papers (2022-07-25T21:21:59Z) - P-STMO: Pre-Trained Spatial Temporal Many-to-One Model for 3D Human Pose
Estimation [78.83305967085413]
This paper introduces a novel Pre-trained Spatial Temporal Many-to-One (P-STMO) model for 2D-to-3D human pose estimation task.
Our method outperforms state-of-the-art methods with fewer parameters and less computational overhead.
arXiv Detail & Related papers (2022-03-15T04:00:59Z) - LatentHuman: Shape-and-Pose Disentangled Latent Representation for Human
Bodies [78.17425779503047]
We propose a novel neural implicit representation for the human body.
It is fully differentiable and optimizable with disentangled shape and pose latent spaces.
Our model can be trained and fine-tuned directly on non-watertight raw data with well-designed losses.
arXiv Detail & Related papers (2021-11-30T04:10:57Z) - Probabilistic Modeling for Human Mesh Recovery [73.11532990173441]
This paper focuses on the problem of 3D human reconstruction from 2D evidence.
We recast the problem as learning a mapping from the input to a distribution of plausible 3D poses.
arXiv Detail & Related papers (2021-08-26T17:55:11Z) - Self-Attentive 3D Human Pose and Shape Estimation from Videos [82.63503361008607]
We present a video-based learning algorithm for 3D human pose and shape estimation.
We exploit temporal information in videos and propose a self-attention module.
We evaluate our method on the 3DPW, MPI-INF-3DHP, and Human3.6M datasets.
arXiv Detail & Related papers (2021-03-26T00:02:19Z) - Forecasting Characteristic 3D Poses of Human Actions [24.186058965796157]
We propose the task of forecasting characteristic 3D poses from a monocular video observation of a person to predict a future 3D pose of that person in a likely action-defining, characteristic pose.
We define a semantically meaningful pose prediction task that decouples the predicted pose from time, taking inspiration from goal-directed behavior.
Our experiments with this dataset suggest that our proposed probabilistic approach outperforms state-of-the-art methods by 22% on average.
arXiv Detail & Related papers (2020-11-30T18:20:17Z) - Synthetic Training for Monocular Human Mesh Recovery [100.38109761268639]
This paper aims to estimate 3D mesh of multiple body parts with large-scale differences from a single RGB image.
The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images.
We propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants.
arXiv Detail & Related papers (2020-10-27T03:31:35Z) - Cascaded deep monocular 3D human pose estimation with evolutionary
training data [76.3478675752847]
Deep representation learning has achieved remarkable accuracy for monocular 3D human pose estimation.
This paper proposes a novel data augmentation method that is scalable for massive amount of training data.
Our method synthesizes unseen 3D human skeletons based on a hierarchical human representation and synthesizings inspired by prior knowledge.
arXiv Detail & Related papers (2020-06-14T03:09:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.