Neural Voting Field for Camera-Space 3D Hand Pose Estimation
- URL: http://arxiv.org/abs/2305.04328v1
- Date: Sun, 7 May 2023 16:51:34 GMT
- Title: Neural Voting Field for Camera-Space 3D Hand Pose Estimation
- Authors: Lin Huang, Chung-Ching Lin, Kevin Lin, Lin Liang, Lijuan Wang, Junsong
Yuan, Zicheng Liu
- Abstract summary: We present a unified framework for camera-space 3D hand pose estimation from a single RGB image based on 3D implicit representation.
We propose a novel unified 3D dense regression scheme to estimate camera-space 3D hand pose via dense 3D point-wise voting in camera frustum.
- Score: 106.34750803910714
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a unified framework for camera-space 3D hand pose estimation from
a single RGB image based on 3D implicit representation. As opposed to recent
works, most of which first adopt holistic or pixel-level dense regression to
obtain relative 3D hand pose and then follow with complex second-stage
operations for 3D global root or scale recovery, we propose a novel unified 3D
dense regression scheme to estimate camera-space 3D hand pose via dense 3D
point-wise voting in camera frustum. Through direct dense modeling in 3D domain
inspired by Pixel-aligned Implicit Functions for 3D detailed reconstruction,
our proposed Neural Voting Field (NVF) fully models 3D dense local evidence and
hand global geometry, helping to alleviate common 2D-to-3D ambiguities.
Specifically, for a 3D query point in camera frustum and its pixel-aligned
image feature, NVF, represented by a Multi-Layer Perceptron, regresses: (i) its
signed distance to the hand surface; (ii) a set of 4D offset vectors (1D voting
weight and 3D directional vector to each hand joint). Following a vote-casting
scheme, 4D offset vectors from near-surface points are selected to calculate
the 3D hand joint coordinates by a weighted average. Experiments demonstrate
that NVF outperforms existing state-of-the-art algorithms on FreiHAND dataset
for camera-space 3D hand pose estimation. We also adapt NVF to the classic task
of root-relative 3D hand pose estimation, for which NVF also obtains
state-of-the-art results on HO3D dataset.
Related papers
- 6D Object Pose Estimation from Approximate 3D Models for Orbital
Robotics [19.64111218032901]
We present a novel technique to estimate the 6D pose of objects from single images.
We employ a dense 2D-to-3D correspondence predictor that regresses 3D model coordinates for every pixel.
Our method achieves state-of-the-art performance on the SPEED+ dataset and has won the SPEC2021 post-mortem competition.
arXiv Detail & Related papers (2023-03-23T13:18:05Z) - Neural Correspondence Field for Object Pose Estimation [67.96767010122633]
We propose a method for estimating the 6DoF pose of a rigid object with an available 3D model from a single RGB image.
Unlike classical correspondence-based methods which predict 3D object coordinates at pixels of the input image, the proposed method predicts 3D object coordinates at 3D query points sampled in the camera frustum.
arXiv Detail & Related papers (2022-07-30T01:48:23Z) - ImplicitVol: Sensorless 3D Ultrasound Reconstruction with Deep Implicit
Representation [13.71137201718831]
The objective of this work is to achieve sensorless reconstruction of a 3D volume from a set of 2D freehand ultrasound images with deep implicit representation.
In contrast to the conventional way that represents a 3D volume as a discrete voxel grid, we do so by parameterizing it as the zero level-set of a continuous function.
Our proposed model, as ImplicitVol, takes a set of 2D scans and their estimated locations in 3D as input, jointly re?fing the estimated 3D locations and learning a full reconstruction of the 3D volume.
arXiv Detail & Related papers (2021-09-24T17:59:18Z) - Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo [71.59494156155309]
Existing approaches for multi-view 3D pose estimation explicitly establish cross-view correspondences to group 2D pose detections from multiple camera views.
We present our multi-view 3D pose estimation approach based on plane sweep stereo to jointly address the cross-view fusion and 3D pose reconstruction in a single shot.
arXiv Detail & Related papers (2021-04-06T03:49:35Z) - Residual Pose: A Decoupled Approach for Depth-based 3D Human Pose
Estimation [18.103595280706593]
We leverage recent advances in reliable 2D pose estimation with CNN to estimate the 3D pose of people from depth images.
Our approach achieves very competitive results both in accuracy and speed on two public datasets.
arXiv Detail & Related papers (2020-11-10T10:08:13Z) - Cylinder3D: An Effective 3D Framework for Driving-scene LiDAR Semantic
Segmentation [87.54570024320354]
State-of-the-art methods for large-scale driving-scene LiDAR semantic segmentation often project and process the point clouds in the 2D space.
A straightforward solution to tackle the issue of 3D-to-2D projection is to keep the 3D representation and process the points in the 3D space.
We develop a 3D cylinder partition and a 3D cylinder convolution based framework, termed as Cylinder3D, which exploits the 3D topology relations and structures of driving-scene point clouds.
arXiv Detail & Related papers (2020-08-04T13:56:19Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z) - HandVoxNet: Deep Voxel-Based Network for 3D Hand Shape and Pose
Estimation from a Single Depth Map [72.93634777578336]
We propose a novel architecture with 3D convolutions trained in a weakly-supervised manner.
The proposed approach improves over the state of the art by 47.8% on the SynHand5M dataset.
Our method produces visually more reasonable and realistic hand shapes on NYU and BigHand2.2M datasets.
arXiv Detail & Related papers (2020-04-03T14:27:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.