Deep Monocular 3D Human Pose Estimation via Cascaded Dimension-Lifting
- URL: http://arxiv.org/abs/2104.03520v1
- Date: Thu, 8 Apr 2021 05:44:02 GMT
- Title: Deep Monocular 3D Human Pose Estimation via Cascaded Dimension-Lifting
- Authors: Changgong Zhang, Fangneng Zhan, Yuan Chang
- Abstract summary: 3D pose estimation from a single image is a challenging problem due to depth ambiguity.
One type of the previous methods lifts 2D joints, obtained by resorting to external 2D pose detectors, to the 3D space.
We propose a novel end-to-end framework that exploits the contextual information but also produces the output directly in the 3D space.
- Score: 10.336146336350811
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The 3D pose estimation from a single image is a challenging problem due to
depth ambiguity. One type of the previous methods lifts 2D joints, obtained by
resorting to external 2D pose detectors, to the 3D space. However, this type of
approaches discards the contextual information of images which are strong cues
for 3D pose estimation. Meanwhile, some other methods predict the joints
directly from monocular images but adopt a 2.5D output representation $P^{2.5D}
= (u,v,z^{r}) $ where both $u$ and $v$ are in the image space but $z^{r}$ in
root-relative 3D space. Thus, the ground-truth information (e.g., the depth of
root joint from the camera) is normally utilized to transform the 2.5D output
to the 3D space, which limits the applicability in practice. In this work, we
propose a novel end-to-end framework that not only exploits the contextual
information but also produces the output directly in the 3D space via cascaded
dimension-lifting. Specifically, we decompose the task of lifting pose from 2D
image space to 3D spatial space into several sequential sub-tasks, 1) kinematic
skeletons \& individual joints estimation in 2D space, 2) root-relative depth
estimation, and 3) lifting to the 3D space, each of which employs direct
supervisions and contextual image features to guide the learning process.
Extensive experiments show that the proposed framework achieves
state-of-the-art performance on two widely used 3D human pose datasets
(Human3.6M, MuPoTS-3D).
Related papers
- MPL: Lifting 3D Human Pose from Multi-view 2D Poses [75.26416079541723]
We propose combining 2D pose estimation, for which large and rich training datasets exist, and 2D-to-3D pose lifting, using a transformer-based network.
Our experiments demonstrate decreases up to 45% in MPJPE errors compared to the 3D pose obtained by triangulating the 2D poses.
arXiv Detail & Related papers (2024-08-20T12:55:14Z) - General Geometry-aware Weakly Supervised 3D Object Detection [62.26729317523975]
A unified framework is developed for learning 3D object detectors from RGB images and associated 2D boxes.
Experiments on KITTI and SUN-RGBD datasets demonstrate that our method yields surprisingly high-quality 3D bounding boxes with only 2D annotation.
arXiv Detail & Related papers (2024-07-18T17:52:08Z) - Unsupervised Multi-Person 3D Human Pose Estimation From 2D Poses Alone [4.648549457266638]
We present one of the first studies investigating the feasibility of unsupervised multi-person 2D-3D pose estimation.
Our method involves independently lifting each subject's 2D pose to 3D, before combining them in a shared 3D coordinate system.
This by itself enables us to retrieve an accurate 3D reconstruction of their poses.
arXiv Detail & Related papers (2023-09-26T11:42:56Z) - MPM: A Unified 2D-3D Human Pose Representation via Masked Pose Modeling [59.74064212110042]
mpmcan handle multiple tasks including 3D human pose estimation, 3D pose estimation from cluded 2D pose, and 3D pose completion in a textocbfsingle framework.
We conduct extensive experiments and ablation studies on several widely used human pose datasets and achieve state-of-the-art performance on MPI-INF-3DHP.
arXiv Detail & Related papers (2023-06-29T10:30:00Z) - Weakly-supervised Pre-training for 3D Human Pose Estimation via
Perspective Knowledge [36.65402869749077]
We propose a novel method to extract weak 3D information directly from 2D images without 3D pose supervision.
We propose a weakly-supervised pre-training (WSP) strategy to distinguish the depth relationship between two points in an image.
WSP achieves state-of-the-art results on two widely-used benchmarks.
arXiv Detail & Related papers (2022-11-22T03:35:15Z) - SPGNet: Spatial Projection Guided 3D Human Pose Estimation in Low
Dimensional Space [14.81199315166042]
We propose a method for 3D human pose estimation that mixes multi-dimensional re-projection into supervised learning.
Based on the estimation results for the dataset Human3.6M, our approach outperforms many state-of-the-art methods both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-06-04T00:51:00Z) - Lifting 2D Human Pose to 3D with Domain Adapted 3D Body Concept [49.49032810966848]
Existing 3D pose estimation suffers from 1) the inherent ambiguity between the 2D and 3D data, and 2) the lack of well labeled 2D-3D pose pairs in the wild.
We propose a new framework that leverages the labeled 3D human poses to learn a 3D concept of the human body to reduce the ambiguity.
By adapting the two domains, the body knowledge learned from 3D poses is applied to 2D poses and guides the 2D pose encoder to generate informative 3D "imagination" as embedding in pose lifting.
arXiv Detail & Related papers (2021-11-23T16:02:12Z) - SAT: 2D Semantics Assisted Training for 3D Visual Grounding [95.84637054325039]
3D visual grounding aims at grounding a natural language description about a 3D scene, usually represented in the form of 3D point clouds, to the targeted object region.
Point clouds are sparse, noisy, and contain limited semantic information compared with 2D images.
We propose 2D Semantics Assisted Training (SAT) that utilizes 2D image semantics in the training stage to ease point-cloud-language joint representation learning.
arXiv Detail & Related papers (2021-05-24T17:58:36Z) - FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection [78.00922683083776]
It is non-trivial to make a general adapted 2D detector work in this 3D task.
In this technical report, we study this problem with a practice built on fully convolutional single-stage detector.
Our solution achieves 1st place out of all the vision-only methods in the nuScenes 3D detection challenge of NeurIPS 2020.
arXiv Detail & Related papers (2021-04-22T09:35:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.