Live Stream Temporally Embedded 3D Human Body Pose and Shape Estimation
- URL: http://arxiv.org/abs/2207.12537v1
- Date: Mon, 25 Jul 2022 21:21:59 GMT
- Title: Live Stream Temporally Embedded 3D Human Body Pose and Shape Estimation
- Authors: Zhouping Wang and Sarah Ostadabbas
- Abstract summary: We present a temporally embedded 3D human body pose and shape estimation (TePose) method to improve the accuracy and temporal consistency pose in live stream videos.
A multi-scale convolutional network is presented as the motion discriminator for adversarial training using datasets without any 3D labeling.
- Score: 13.40702053084305
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: 3D Human body pose and shape estimation within a temporal sequence can be
quite critical for understanding human behavior. Despite the significant
progress in human pose estimation in the recent years, which are often based on
single images or videos, human motion estimation on live stream videos is still
a rarely-touched area considering its special requirements for real-time output
and temporal consistency. To address this problem, we present a temporally
embedded 3D human body pose and shape estimation (TePose) method to improve the
accuracy and temporal consistency of pose estimation in live stream videos.
TePose uses previous predictions as a bridge to feedback the error for better
estimation in the current frame and to learn the correspondence between data
frames and predictions in the history. A multi-scale spatio-temporal graph
convolutional network is presented as the motion discriminator for adversarial
training using datasets without any 3D labeling. We propose a sequential data
loading strategy to meet the special start-to-end data processing requirement
of live stream. We demonstrate the importance of each proposed module with
extensive experiments. The results show the effectiveness of TePose on
widely-used human pose benchmarks with state-of-the-art performance.
Related papers
- HMP: Hand Motion Priors for Pose and Shape Estimation from Video [52.39020275278984]
We develop a generative motion prior specific for hands, trained on the AMASS dataset which features diverse and high-quality hand motions.
Our integration of a robust motion prior significantly enhances performance, especially in occluded scenarios.
We demonstrate our method's efficacy via qualitative and quantitative evaluations on the HO3D and DexYCB datasets.
arXiv Detail & Related papers (2023-12-27T22:35:33Z) - STRIDE: Single-video based Temporally Continuous Occlusion Robust 3D Pose Estimation [27.854074900345314]
We propose STRIDE, a novel Test-Time Training (TTT) approach to fit a human motion prior to each video.
Our framework demonstrates flexibility by being model-agnostic, allowing us to use any off-the-shelf 3D pose estimation method for improving robustness and temporal consistency.
We validate STRIDE's efficacy through comprehensive experiments on challenging datasets like Occluded Human3.6M, Human3.6M, and OCMotion.
arXiv Detail & Related papers (2023-12-24T11:05:10Z) - Unsupervised 3D Pose Estimation with Non-Rigid Structure-from-Motion
Modeling [83.76377808476039]
We propose a new modeling method for human pose deformations and design an accompanying diffusion-based motion prior.
Inspired by the field of non-rigid structure-from-motion, we divide the task of reconstructing 3D human skeletons in motion into the estimation of a 3D reference skeleton.
A mixed spatial-temporal NRSfMformer is used to simultaneously estimate the 3D reference skeleton and the skeleton deformation of each frame from 2D observations sequence.
arXiv Detail & Related papers (2023-08-18T16:41:57Z) - Kinematic-aware Hierarchical Attention Network for Human Pose Estimation
in Videos [17.831839654593452]
Previous-based human pose estimation methods have shown promising results by leveraging features of consecutive frames.
Most approaches compromise accuracy to jitter and do not comprehend the temporal aspects of human motion.
We design an architecture that exploits kinematic keypoint features.
arXiv Detail & Related papers (2022-11-29T01:46:11Z) - Multi-level Motion Attention for Human Motion Prediction [132.29963836262394]
We study the use of different types of attention, computed at joint, body part, and full pose levels.
Our experiments on Human3.6M, AMASS and 3DPW validate the benefits of our approach for both periodical and non-periodical actions.
arXiv Detail & Related papers (2021-06-17T08:08:11Z) - Learning Dynamics via Graph Neural Networks for Human Pose Estimation
and Tracking [98.91894395941766]
We propose a novel online approach to learning the pose dynamics, which are independent of pose detections in current fame.
Specifically, we derive this prediction of dynamics through a graph neural network(GNN) that explicitly accounts for both spatial-temporal and visual information.
Experiments on PoseTrack 2017 and PoseTrack 2018 datasets demonstrate that the proposed method achieves results superior to the state of the art on both human pose estimation and tracking tasks.
arXiv Detail & Related papers (2021-06-07T16:36:50Z) - Self-Attentive 3D Human Pose and Shape Estimation from Videos [82.63503361008607]
We present a video-based learning algorithm for 3D human pose and shape estimation.
We exploit temporal information in videos and propose a self-attention module.
We evaluate our method on the 3DPW, MPI-INF-3DHP, and Human3.6M datasets.
arXiv Detail & Related papers (2021-03-26T00:02:19Z) - Multi-Scale Networks for 3D Human Pose Estimation with Inference Stage
Optimization [33.02708860641971]
Estimating 3D human poses from a monocular video is still a challenging task.
Many existing methods drop when the target person is cluded by other objects, or the motion is too fast/slow relative to the scale and speed of the training data.
We introduce atemporal-temporal network for robust 3D human pose estimation.
arXiv Detail & Related papers (2020-10-13T15:24:28Z) - A Graph Attention Spatio-temporal Convolutional Network for 3D Human
Pose Estimation in Video [7.647599484103065]
We improve the learning of constraints in human skeleton by modeling local global spatial information via attention mechanisms.
Our approach effectively mitigates depth ambiguity and self-occlusion, generalizes to half upper body estimation, and achieves competitive performance on 2D-to-3D video pose estimation.
arXiv Detail & Related papers (2020-03-11T14:54:40Z) - Anatomy-aware 3D Human Pose Estimation with Bone-based Pose
Decomposition [92.99291528676021]
Instead of directly regressing the 3D joint locations, we decompose the task into bone direction prediction and bone length prediction.
Our motivation is the fact that the bone lengths of a human skeleton remain consistent across time.
Our full model outperforms the previous best results on Human3.6M and MPI-INF-3DHP datasets.
arXiv Detail & Related papers (2020-02-24T15:49:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.