Real-time RGBD-based Extended Body Pose Estimation
- URL: http://arxiv.org/abs/2103.03663v1
- Date: Fri, 5 Mar 2021 13:37:50 GMT
- Title: Real-time RGBD-based Extended Body Pose Estimation
- Authors: Renat Bashirov, Anastasia Ianina, Karim Iskakov, Yevgeniy Kononenko,
Valeriya Strizhkova, Victor Lempitsky, Alexander Vakhitov
- Abstract summary: We present a system for real-time RGBD-based estimation of 3D human pose.
We use parametric 3D deformable human mesh model (SMPL-X) as a representation.
We train estimators of body pose and facial expression parameters.
- Score: 57.61868412206493
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We present a system for real-time RGBD-based estimation of 3D human pose. We
use parametric 3D deformable human mesh model (SMPL-X) as a representation and
focus on the real-time estimation of parameters for the body pose, hands pose
and facial expression from Kinect Azure RGB-D camera. We train estimators of
body pose and facial expression parameters. Both estimators use previously
published landmark extractors as input and custom annotated datasets for
supervision, while hand pose is estimated directly by a previously published
method. We combine the predictions of those estimators into a temporally-smooth
human pose. We train the facial expression extractor on a large talking face
dataset, which we annotate with facial expression parameters. For the body pose
we collect and annotate a dataset of 56 people captured from a rig of 5 Kinect
Azure RGB-D cameras and use it together with a large motion capture AMASS
dataset. Our RGB-D body pose model outperforms the state-of-the-art RGB-only
methods and works on the same level of accuracy compared to a slower RGB-D
optimization-based solution. The combined system runs at 30 FPS on a server
with a single GPU. The code will be available at
https://saic-violet.github.io/rgbd-kinect-pose
Related papers
- EPOCH: Jointly Estimating the 3D Pose of Cameras and Humans [5.047302480095444]
Monocular Human Pose Estimation aims at determining the 3D positions of human joints from a single 2D image captured by a camera.
In this study, instead of relying on approximations, we advocate for utilizing the full perspective camera model.
We introduce the EPOCH framework, comprising two main components: the pose lifter network (LiftNet) and the pose regressor network (RegNet)
arXiv Detail & Related papers (2024-06-28T08:16:54Z) - Multi-HMR: Multi-Person Whole-Body Human Mesh Recovery in a Single Shot [22.848563931757962]
We present Multi-HMR, a strong sigle-shot model for multi-person 3D human mesh recovery from a single RGB image.
Predictions encompass the whole body, including hands and facial expressions, using the SMPL-X parametric model.
We show that incorporating it into the training data further enhances predictions, particularly for hands.
arXiv Detail & Related papers (2024-02-22T16:05:13Z) - PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape
Prediction [77.89935657608926]
We propose a Pose-Free Large Reconstruction Model (PF-LRM) for reconstructing a 3D object from a few unposed images.
PF-LRM simultaneously estimates the relative camera poses in 1.3 seconds on a single A100 GPU.
arXiv Detail & Related papers (2023-11-20T18:57:55Z) - Adversarial Parametric Pose Prior [106.12437086990853]
We learn a prior that restricts the SMPL parameters to values that produce realistic poses via adversarial training.
We show that our learned prior covers the diversity of the real-data distribution, facilitates optimization for 3D reconstruction from 2D keypoints, and yields better pose estimates when used for regression from images.
arXiv Detail & Related papers (2021-12-08T10:05:32Z) - RGB2Hands: Real-Time Tracking of 3D Hand Interactions from Monocular RGB
Video [76.86512780916827]
We present the first real-time method for motion capture of skeletal pose and 3D surface geometry of hands from a single RGB camera.
In order to address the inherent depth ambiguities in RGB data, we propose a novel multi-task CNN.
We experimentally verify the individual components of our RGB two-hand tracking and 3D reconstruction pipeline.
arXiv Detail & Related papers (2021-06-22T12:53:56Z) - Lifting Monocular Events to 3D Human Poses [22.699272716854967]
This paper presents a novel 3D human pose estimation approach using a single stream of asynchronous events as input.
We propose the first learning-based method for 3D human pose from a single stream of events.
Experiments demonstrate that our method achieves solid accuracy, narrowing the performance gap between standard RGB and event-based vision.
arXiv Detail & Related papers (2021-04-21T16:07:12Z) - Synthetic Training for Monocular Human Mesh Recovery [100.38109761268639]
This paper aims to estimate 3D mesh of multiple body parts with large-scale differences from a single RGB image.
The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images.
We propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants.
arXiv Detail & Related papers (2020-10-27T03:31:35Z) - Monocular Expressive Body Regression through Body-Driven Attention [68.63766976089842]
We introduce ExPose, which regresses the body, face, and hands, in SMPL-X format, from an RGB image.
hands and faces are much smaller than the body, occupying very few image pixels.
We observe that body estimation localizes the face and hands reasonably well.
arXiv Detail & Related papers (2020-08-20T16:33:47Z) - RGBD-Dog: Predicting Canine Pose from RGBD Sensors [25.747221533627464]
We focus on the problem of 3D canine pose estimation from RGBD images.
We generate a dataset of synthetic RGBD images from this data.
A stacked hourglass network is trained to predict 3D joint locations.
arXiv Detail & Related papers (2020-04-16T17:34:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.