Related papers: Real-time RGBD-based Extended Body Pose Estimation

Real-time RGBD-based Extended Body Pose Estimation

URL: http://arxiv.org/abs/2103.03663v1
Date: Fri, 5 Mar 2021 13:37:50 GMT
Title: Real-time RGBD-based Extended Body Pose Estimation
Authors: Renat Bashirov, Anastasia Ianina, Karim Iskakov, Yevgeniy Kononenko, Valeriya Strizhkova, Victor Lempitsky, Alexander Vakhitov
Abstract summary: We present a system for real-time RGBD-based estimation of 3D human pose. We use parametric 3D deformable human mesh model (SMPL-X) as a representation. We train estimators of body pose and facial expression parameters.
Score: 57.61868412206493
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We present a system for real-time RGBD-based estimation of 3D human pose. We use parametric 3D deformable human mesh model (SMPL-X) as a representation and focus on the real-time estimation of parameters for the body pose, hands pose and facial expression from Kinect Azure RGB-D camera. We train estimators of body pose and facial expression parameters. Both estimators use previously published landmark extractors as input and custom annotated datasets for supervision, while hand pose is estimated directly by a previously published method. We combine the predictions of those estimators into a temporally-smooth human pose. We train the facial expression extractor on a large talking face dataset, which we annotate with facial expression parameters. For the body pose we collect and annotate a dataset of 56 people captured from a rig of 5 Kinect Azure RGB-D cameras and use it together with a large motion capture AMASS dataset. Our RGB-D body pose model outperforms the state-of-the-art RGB-only methods and works on the same level of accuracy compared to a slower RGB-D optimization-based solution. The combined system runs at 30 FPS on a server with a single GPU. The code will be available at https://saic-violet.github.io/rgbd-kinect-pose

Related papers

Estimating Object Physical Properties from RGB-D Vision and Depth Robot Sensors Using Deep Learning [2.07180164747172]
Inertial mass plays a crucial role in robotic applications such as object grasping, manipulation, and simulation.<n>This paper proposes a novel approach combining sparse point-cloud data from depth images with RGB images to estimate the mass of objects.
arXiv Detail & Related papers (2025-07-07T14:11:47Z)
CameraHMR: Aligning People with Perspective [54.05758012879385]
We address the challenge of accurate 3D human pose and shape estimation from monocular images. Existing training datasets containing real images with pseudo ground truth (pGT) use SMPLify to fit SMPL to sparse 2D joint locations. We make two contributions that improve pGT accuracy.
arXiv Detail & Related papers (2024-11-12T19:12:12Z)
EPOCH: Jointly Estimating the 3D Pose of Cameras and Humans [5.047302480095444]
Monocular Human Pose Estimation aims at determining the 3D positions of human joints from a single 2D image captured by a camera. In this study, instead of relying on approximations, we advocate for utilizing the full perspective camera model. We introduce the EPOCH framework, comprising two main components: the pose lifter network (LiftNet) and the pose regressor network (RegNet)
arXiv Detail & Related papers (2024-06-28T08:16:54Z)
Multi-HMR: Multi-Person Whole-Body Human Mesh Recovery in a Single Shot [22.848563931757962]
We present Multi-HMR, a strong sigle-shot model for multi-person 3D human mesh recovery from a single RGB image. Predictions encompass the whole body, including hands and facial expressions, using the SMPL-X parametric model. We show that incorporating it into the training data further enhances predictions, particularly for hands.
arXiv Detail & Related papers (2024-02-22T16:05:13Z)
PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction [77.89935657608926]
We propose a Pose-Free Large Reconstruction Model (PF-LRM) for reconstructing a 3D object from a few unposed images. PF-LRM simultaneously estimates the relative camera poses in 1.3 seconds on a single A100 GPU.
arXiv Detail & Related papers (2023-11-20T18:57:55Z)
RGB2Hands: Real-Time Tracking of 3D Hand Interactions from Monocular RGB Video [76.86512780916827]
We present the first real-time method for motion capture of skeletal pose and 3D surface geometry of hands from a single RGB camera. In order to address the inherent depth ambiguities in RGB data, we propose a novel multi-task CNN. We experimentally verify the individual components of our RGB two-hand tracking and 3D reconstruction pipeline.
arXiv Detail & Related papers (2021-06-22T12:53:56Z)
Lifting Monocular Events to 3D Human Poses [22.699272716854967]
This paper presents a novel 3D human pose estimation approach using a single stream of asynchronous events as input. We propose the first learning-based method for 3D human pose from a single stream of events. Experiments demonstrate that our method achieves solid accuracy, narrowing the performance gap between standard RGB and event-based vision.
arXiv Detail & Related papers (2021-04-21T16:07:12Z)
Synthetic Training for Monocular Human Mesh Recovery [100.38109761268639]
This paper aims to estimate 3D mesh of multiple body parts with large-scale differences from a single RGB image. The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images. We propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants.
arXiv Detail & Related papers (2020-10-27T03:31:35Z)
Monocular Expressive Body Regression through Body-Driven Attention [68.63766976089842]
We introduce ExPose, which regresses the body, face, and hands, in SMPL-X format, from an RGB image. hands and faces are much smaller than the body, occupying very few image pixels. We observe that body estimation localizes the face and hands reasonably well.
arXiv Detail & Related papers (2020-08-20T16:33:47Z)
RGBD-Dog: Predicting Canine Pose from RGBD Sensors [25.747221533627464]
We focus on the problem of 3D canine pose estimation from RGBD images. We generate a dataset of synthetic RGBD images from this data. A stacked hourglass network is trained to predict 3D joint locations.
arXiv Detail & Related papers (2020-04-16T17:34:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.