Ray3D: ray-based 3D human pose estimation for monocular absolute 3D
localization
- URL: http://arxiv.org/abs/2203.11471v1
- Date: Tue, 22 Mar 2022 05:42:31 GMT
- Title: Ray3D: ray-based 3D human pose estimation for monocular absolute 3D
localization
- Authors: Yu Zhan, Fenghai Li, Renliang Weng, Wongun Choi
- Abstract summary: We propose a novel monocular ray-based 3D (Ray3D) absolute human pose estimation with calibrated camera.
Our method significantly outperforms existing state-of-the-art models.
- Score: 3.5379706873065917
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this paper, we propose a novel monocular ray-based 3D (Ray3D) absolute
human pose estimation with calibrated camera. Accurate and generalizable
absolute 3D human pose estimation from monocular 2D pose input is an ill-posed
problem. To address this challenge, we convert the input from pixel space to 3D
normalized rays. This conversion makes our approach robust to camera intrinsic
parameter changes. To deal with the in-the-wild camera extrinsic parameter
variations, Ray3D explicitly takes the camera extrinsic parameters as an input
and jointly models the distribution between the 3D pose rays and camera
extrinsic parameters. This novel network design is the key to the outstanding
generalizability of Ray3D approach. To have a comprehensive understanding of
how the camera intrinsic and extrinsic parameter variations affect the accuracy
of absolute 3D key-point localization, we conduct in-depth systematic
experiments on three single person 3D benchmarks as well as one synthetic
benchmark. These experiments demonstrate that our method significantly
outperforms existing state-of-the-art models. Our code and the synthetic
dataset are available at https://github.com/YxZhxn/Ray3D .
Related papers
- Director3D: Real-world Camera Trajectory and 3D Scene Generation from Text [61.9973218744157]
We introduce Director3D, a robust open-world text-to-3D generation framework, designed to generate both real-world 3D scenes and adaptive camera trajectories.
Experiments demonstrate that Director3D outperforms existing methods, offering superior performance in real-world 3D generation.
arXiv Detail & Related papers (2024-06-25T14:42:51Z) - X-Ray: A Sequential 3D Representation For Generation [54.160173837582796]
We introduce X-Ray, a novel 3D sequential representation inspired by x-ray scans.
X-Ray transforms a 3D object into a series of surface frames at different layers, making it suitable for generating 3D models from images.
arXiv Detail & Related papers (2024-04-22T16:40:11Z) - Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding [83.63231467746598]
We introduce Any2Point, a parameter-efficient method to empower any-modality large models (vision, language, audio) for 3D understanding.
We propose a 3D-to-any (1D or 2D) virtual projection strategy that correlates the input 3D points to the original 1D or 2D positions within the source modality.
arXiv Detail & Related papers (2024-04-11T17:59:45Z) - Tame a Wild Camera: In-the-Wild Monocular Camera Calibration [12.55056916519563]
Previous methods for the monocular camera calibration rely on specific 3D objects or strong geometry prior.
Our method is assumption-free and calibrates the complete $4$ Degree-of-Freedom (DoF) intrinsic parameters.
We demonstrate downstream applications in image manipulation detection & restoration, uncalibrated two-view pose estimation, and 3D sensing.
arXiv Detail & Related papers (2023-06-19T14:55:26Z) - Neural Voting Field for Camera-Space 3D Hand Pose Estimation [106.34750803910714]
We present a unified framework for camera-space 3D hand pose estimation from a single RGB image based on 3D implicit representation.
We propose a novel unified 3D dense regression scheme to estimate camera-space 3D hand pose via dense 3D point-wise voting in camera frustum.
arXiv Detail & Related papers (2023-05-07T16:51:34Z) - 6D Object Pose Estimation from Approximate 3D Models for Orbital
Robotics [19.64111218032901]
We present a novel technique to estimate the 6D pose of objects from single images.
We employ a dense 2D-to-3D correspondence predictor that regresses 3D model coordinates for every pixel.
Our method achieves state-of-the-art performance on the SPEED+ dataset and has won the SPEC2021 post-mortem competition.
arXiv Detail & Related papers (2023-03-23T13:18:05Z) - DIREG3D: DIrectly REGress 3D Hands from Multiple Cameras [0.22940141855172028]
DIREG3D is capable of utilizing camera parameters, 3D geometry, intermediate 2D cues, and visual information to regress parameters for accurately representing a Hand Mesh model.
We extend these results to a multi-view camera setup by fusing features from different viewpoints.
arXiv Detail & Related papers (2022-01-26T21:03:56Z) - FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection [78.00922683083776]
It is non-trivial to make a general adapted 2D detector work in this 3D task.
In this technical report, we study this problem with a practice built on fully convolutional single-stage detector.
Our solution achieves 1st place out of all the vision-only methods in the nuScenes 3D detection challenge of NeurIPS 2020.
arXiv Detail & Related papers (2021-04-22T09:35:35Z) - HybrIK: A Hybrid Analytical-Neural Inverse Kinematics Solution for 3D
Human Pose and Shape Estimation [39.67289969828706]
We propose a novel hybrid inverse kinematics solution (HybrIK) to bridge the gap between body mesh estimation and 3D keypoint estimation.
HybrIK directly transforms accurate 3D joints to relative body-part rotations for 3D body mesh reconstruction.
We show that HybrIK preserves both the accuracy of 3D pose and the realistic body structure of the parametric human model.
arXiv Detail & Related papers (2020-11-30T10:32:30Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.