3D Human Pose and Shape Estimation via HybrIK-Transformer
- URL: http://arxiv.org/abs/2302.04774v4
- Date: Sat, 22 Apr 2023 18:11:30 GMT
- Title: 3D Human Pose and Shape Estimation via HybrIK-Transformer
- Authors: Boris N. Oreshkin
- Abstract summary: HybrIK relies on a combination of analytical inverse kinematics and deep learning to produce more accurate 3D pose estimation.
We propose an enhancement of the 2D to 3D lifting module, replacing deconvolution with Transformer.
- Score: 11.193504036335503
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: HybrIK relies on a combination of analytical inverse kinematics and deep
learning to produce more accurate 3D pose estimation from 2D monocular images.
HybrIK has three major components: (1) pretrained convolution backbone, (2)
deconvolution to lift 3D pose from 2D convolution features, (3) analytical
inverse kinematics pass correcting deep learning prediction using learned
distribution of plausible twist and swing angles. In this paper we propose an
enhancement of the 2D to 3D lifting module, replacing deconvolution with
Transformer, resulting in accuracy and computational efficiency improvement
relative to the original HybrIK method. We demonstrate our results on commonly
used H36M, PW3D, COCO and HP3D datasets. Our code is publicly available
https://github.com/boreshkinai/hybrik-transformer.
Related papers
- GSD: View-Guided Gaussian Splatting Diffusion for 3D Reconstruction [52.04103235260539]
We present a diffusion model approach based on Gaussian Splatting representation for 3D object reconstruction from a single view.
The model learns to generate 3D objects represented by sets of GS ellipsoids.
The final reconstructed objects explicitly come with high-quality 3D structure and texture, and can be efficiently rendered in arbitrary views.
arXiv Detail & Related papers (2024-07-05T03:43:08Z) - UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues [55.69339788566899]
UPose3D is a novel approach for multi-view 3D human pose estimation.
It improves robustness and flexibility without requiring direct 3D annotations.
arXiv Detail & Related papers (2024-04-23T00:18:00Z) - Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding [83.63231467746598]
We introduce Any2Point, a parameter-efficient method to empower any-modality large models (vision, language, audio) for 3D understanding.
We propose a 3D-to-any (1D or 2D) virtual projection strategy that correlates the input 3D points to the original 1D or 2D positions within the source modality.
arXiv Detail & Related papers (2024-04-11T17:59:45Z) - 3D Geometry-aware Deformable Gaussian Splatting for Dynamic View Synthesis [49.352765055181436]
We propose a 3D geometry-aware deformable Gaussian Splatting method for dynamic view synthesis.
Our solution achieves 3D geometry-aware deformation modeling, which enables improved dynamic view synthesis and 3D dynamic reconstruction.
arXiv Detail & Related papers (2024-04-09T12:47:30Z) - HybrIK-X: Hybrid Analytical-Neural Inverse Kinematics for Whole-body
Mesh Recovery [40.88628101598707]
This paper presents a novel hybrid inverse kinematics solution, HybrIK, that integrates 3D keypoint estimation and body mesh recovery.
HybrIK directly transforms accurate 3D joints to body-part rotations via twist-and-swing decomposition.
We further develop a holistic framework, HybrIK-X, which enhances HybrIK with articulated hands and an expressive face.
arXiv Detail & Related papers (2023-04-12T08:29:31Z) - Uplift and Upsample: Efficient 3D Human Pose Estimation with Uplifting
Transformers [28.586258731448687]
We present a Transformer-based pose uplifting scheme that can operate on temporally sparse 2D pose sequences.
We show how masked token modeling can be utilized for temporal upsampling within Transformer blocks.
We evaluate our method on two popular benchmark datasets: Human3.6M and MPI-INF-3DHP.
arXiv Detail & Related papers (2022-10-12T12:00:56Z) - To The Point: Correspondence-driven monocular 3D category reconstruction [39.811816510186475]
To The Point (TTP) is a method for reconstructing 3D objects from a single image using 2D to 3D correspondences learned from weak supervision.
We replace CNN-based regression of camera pose and non-rigid deformation and obtain substantially more accurate 3D reconstructions.
arXiv Detail & Related papers (2021-06-10T11:21:14Z) - HybrIK: A Hybrid Analytical-Neural Inverse Kinematics Solution for 3D
Human Pose and Shape Estimation [39.67289969828706]
We propose a novel hybrid inverse kinematics solution (HybrIK) to bridge the gap between body mesh estimation and 3D keypoint estimation.
HybrIK directly transforms accurate 3D joints to relative body-part rotations for 3D body mesh reconstruction.
We show that HybrIK preserves both the accuracy of 3D pose and the realistic body structure of the parametric human model.
arXiv Detail & Related papers (2020-11-30T10:32:30Z) - Synthetic Training for Monocular Human Mesh Recovery [100.38109761268639]
This paper aims to estimate 3D mesh of multiple body parts with large-scale differences from a single RGB image.
The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images.
We propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants.
arXiv Detail & Related papers (2020-10-27T03:31:35Z) - Exemplar Fine-Tuning for 3D Human Model Fitting Towards In-the-Wild 3D
Human Pose Estimation [107.07047303858664]
Large-scale human datasets with 3D ground-truth annotations are difficult to obtain in the wild.
We address this problem by augmenting existing 2D datasets with high-quality 3D pose fits.
The resulting annotations are sufficient to train from scratch 3D pose regressor networks that outperform the current state-of-the-art on in-the-wild benchmarks.
arXiv Detail & Related papers (2020-04-07T20:21:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.