JGR-P2O: Joint Graph Reasoning based Pixel-to-Offset Prediction Network
for 3D Hand Pose Estimation from a Single Depth Image
- URL: http://arxiv.org/abs/2007.04646v2
- Date: Fri, 10 Jul 2020 03:49:36 GMT
- Title: JGR-P2O: Joint Graph Reasoning based Pixel-to-Offset Prediction Network
for 3D Hand Pose Estimation from a Single Depth Image
- Authors: Linpu Fang, Xingyan Liu, Li Liu, Hang Xu, and Wenxiong Kang
- Abstract summary: State-of-the-art single depth image-based 3D hand pose estimation methods are based on dense predictions.
A novel pixel-wise prediction-based method is proposed to address the above issues.
The proposed model is implemented with an efficient 2D fully convolutional network backbone and has only about 1.4M parameters.
- Score: 28.753759115780515
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: State-of-the-art single depth image-based 3D hand pose estimation methods are
based on dense predictions, including voxel-to-voxel predictions,
point-to-point regression, and pixel-wise estimations. Despite the good
performance, those methods have a few issues in nature, such as the poor
trade-off between accuracy and efficiency, and plain feature representation
learning with local convolutions. In this paper, a novel pixel-wise
prediction-based method is proposed to address the above issues. The key ideas
are two-fold: a) explicitly modeling the dependencies among joints and the
relations between the pixels and the joints for better local feature
representation learning; b) unifying the dense pixel-wise offset predictions
and direct joint regression for end-to-end training. Specifically, we first
propose a graph convolutional network (GCN) based joint graph reasoning module
to model the complex dependencies among joints and augment the representation
capability of each pixel. Then we densely estimate all pixels' offsets to
joints in both image plane and depth space and calculate the joints' positions
by a weighted average over all pixels' predictions, totally discarding the
complex postprocessing operations. The proposed model is implemented with an
efficient 2D fully convolutional network (FCN) backbone and has only about 1.4M
parameters. Extensive experiments on multiple 3D hand pose estimation
benchmarks demonstrate that the proposed method achieves new state-of-the-art
accuracy while running very efficiently with around a speed of 110fps on a
single NVIDIA 1080Ti GPU.
Related papers
- UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues [55.69339788566899]
UPose3D is a novel approach for multi-view 3D human pose estimation.
It improves robustness and flexibility without requiring direct 3D annotations.
arXiv Detail & Related papers (2024-04-23T00:18:00Z) - DVMNet: Computing Relative Pose for Unseen Objects Beyond Hypotheses [59.51874686414509]
Current approaches approximate the continuous pose representation with a large number of discrete pose hypotheses.
We present a Deep Voxel Matching Network (DVMNet) that eliminates the need for pose hypotheses and computes the relative object pose in a single pass.
Our method delivers more accurate relative pose estimates for novel objects at a lower computational cost compared to state-of-the-art methods.
arXiv Detail & Related papers (2024-03-20T15:41:32Z) - Improving 3D Pose Estimation for Sign Language [38.20064386142944]
This work addresses 3D human pose reconstruction in single images.
We present a method that combines Forward Kinematics (FK) with neural networks to ensure a fast and valid prediction of 3D pose.
arXiv Detail & Related papers (2023-08-18T13:05:10Z) - Iterative Graph Filtering Network for 3D Human Pose Estimation [5.177947445379688]
Graph convolutional networks (GCNs) have proven to be an effective approach for 3D human pose estimation.
In this paper, we introduce an iterative graph filtering framework for 3D human pose estimation.
Our approach builds upon the idea of iteratively solving graph filtering with Laplacian regularization.
arXiv Detail & Related papers (2023-07-29T20:46:44Z) - Single Image Depth Prediction Made Better: A Multivariate Gaussian Take [163.14849753700682]
We introduce an approach that performs continuous modeling of per-pixel depth.
Our method's accuracy (named MG) is among the top on the KITTI depth-prediction benchmark leaderboard.
arXiv Detail & Related papers (2023-03-31T16:01:03Z) - Contour Context: Abstract Structural Distribution for 3D LiDAR Loop
Detection and Metric Pose Estimation [31.968749056155467]
This paper proposes a simple, effective, and efficient topological loop closure detection pipeline with accurate 3-DoF metric pose estimation.
We interpret the Cartesian birds' eye view (BEV) image projected from 3D LiDAR points as layered distribution of structures.
A retrieval key is designed to accelerate the search of a database indexed by layered KD-trees.
arXiv Detail & Related papers (2023-02-13T07:18:24Z) - Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images [79.70127290464514]
We decompose the task into two stages, i.e. person localization and pose estimation.
And we propose three task-specific graph neural networks for effective message passing.
Our approach achieves state-of-the-art performance on CMU Panoptic and Shelf datasets.
arXiv Detail & Related papers (2021-09-13T11:44:07Z) - DFM: A Performance Baseline for Deep Feature Matching [10.014010310188821]
The proposed method uses pre-trained VGG architecture as a feature extractor and does not require any additional training specific to improve matching.
Our algorithm achieves 0.57 and 0.80 overall scores in terms of Mean Matching Accuracy (MMA) for 1 pixel and 2 pixels thresholds respectively on Hpatches dataset.
arXiv Detail & Related papers (2021-06-14T22:55:06Z) - A hybrid classification-regression approach for 3D hand pose estimation
using graph convolutional networks [1.0152838128195467]
We propose a two-stage GCN-based framework that learns per-pose relationship constraints.
The first phase quantizes the 2D/3D space to classify the joints into 2D/3D blocks based on their locality.
The second stage uses a GCN-based module that uses an adaptative nearest neighbor algorithm to determine joint relationships.
arXiv Detail & Related papers (2021-05-23T10:09:10Z) - I2L-MeshNet: Image-to-Lixel Prediction Network for Accurate 3D Human
Pose and Mesh Estimation from a Single RGB Image [79.040930290399]
We propose I2L-MeshNet, an image-to-lixel (line+pixel) prediction network.
The proposed I2L-MeshNet predicts the per-lixel likelihood on 1D heatmaps for each mesh coordinate instead of directly regressing the parameters.
Our lixel-based 1D heatmap preserves the spatial relationship in the input image and models the prediction uncertainty.
arXiv Detail & Related papers (2020-08-09T12:13:31Z) - Locally Masked Convolution for Autoregressive Models [107.4635841204146]
LMConv is a simple modification to the standard 2D convolution that allows arbitrary masks to be applied to the weights at each location in the image.
We learn an ensemble of distribution estimators that share parameters but differ in generation order, achieving improved performance on whole-image density estimation.
arXiv Detail & Related papers (2020-06-22T17:59:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.