Related papers: A hybrid classification-regression approach for 3D hand pose estimation using graph convolutional networks

A hybrid classification-regression approach for 3D hand pose estimation using graph convolutional networks

URL: http://arxiv.org/abs/2105.10902v1
Date: Sun, 23 May 2021 10:09:10 GMT
Title: A hybrid classification-regression approach for 3D hand pose estimation using graph convolutional networks
Authors: Ikram Kourbane, Yakup Genc
Abstract summary: We propose a two-stage GCN-based framework that learns per-pose relationship constraints. The first phase quantizes the 2D/3D space to classify the joints into 2D/3D blocks based on their locality. The second stage uses a GCN-based module that uses an adaptative nearest neighbor algorithm to determine joint relationships.
Score: 1.0152838128195467
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Hand pose estimation is a crucial part of a wide range of augmented reality and human-computer interaction applications. Predicting the 3D hand pose from a single RGB image is challenging due to occlusion and depth ambiguities. GCN-based (Graph Convolutional Networks) methods exploit the structural relationship similarity between graphs and hand joints to model kinematic dependencies between joints. These techniques use predefined or globally learned joint relationships, which may fail to capture pose-dependent constraints. To address this problem, we propose a two-stage GCN-based framework that learns per-pose relationship constraints. Specifically, the first phase quantizes the 2D/3D space to classify the joints into 2D/3D blocks based on their locality. This spatial dependency information guides this phase to estimate reliable 2D and 3D poses. The second stage further improves the 3D estimation through a GCN-based module that uses an adaptative nearest neighbor algorithm to determine joint relationships. Extensive experiments show that our multi-stage GCN approach yields an efficient model that produces accurate 2D/3D hand poses and outperforms the state-of-the-art on two public datasets.

Related papers

PoseGRAF: Geometric-Reinforced Adaptive Fusion for Monocular 3D Human Pose Estimation [5.223657684081615]
Existing monocular 3D pose estimation methods rely on joint positional features, while overlooking intrinsic directional and angular correlations within the skeleton.<n>We propose the PoseGRAF framework to address these challenges.<n> Experimental results on the Human3.6M and MPI-INF-3DHP datasets show that our method exceeds state-of-the-art approaches.
arXiv Detail & Related papers (2025-06-17T14:59:56Z)
3D Human Pose Estimation via Spatial Graph Order Attention and Temporal Body Aware Transformer [5.303583360581161]
We propose a new method that exploits the graph modeling capability of GCN to represent each skeleton with multiple graphs of different orders.<n>The resulting spatial features of the sequence are processed using a proposed temporal Body Aware Transformer.<n>Experiments on Human3.6m, MPIINF-3DHP, and HumanEva-I datasets demonstrate the effectiveness of the proposed method.
arXiv Detail & Related papers (2025-05-02T04:58:04Z)
Learning to Align and Refine: A Foundation-to-Diffusion Framework for Occlusion-Robust Two-Hand Reconstruction [50.952228546326516]
Two-hand reconstruction from monocular images faces persistent challenges due to complex and dynamic hand postures.<n>Existing approaches struggle with such alignment issues, often resulting in misalignment and penetration artifacts.<n>We propose a dual-stage Foundation-to-Diffusion framework that precisely align 2D prior guidance from vision foundation models.
arXiv Detail & Related papers (2025-03-22T14:42:27Z)
GEAL: Generalizable 3D Affordance Learning with Cross-Modal Consistency [50.11520458252128]
Existing 3D affordance learning methods struggle with generalization and robustness due to limited annotated data. We propose GEAL, a novel framework designed to enhance the generalization and robustness of 3D affordance learning by leveraging large-scale pre-trained 2D models. GEAL consistently outperforms existing methods across seen and novel object categories, as well as corrupted data.
arXiv Detail & Related papers (2024-12-12T17:59:03Z)
3D Hand Reconstruction via Aggregating Intra and Inter Graphs Guided by Prior Knowledge for Hand-Object Interaction Scenario [8.364378460776832]
We propose a 3D hand reconstruction network combining the benefits of model-based and model-free approaches to balance accuracy and physical plausibility for hand-object interaction scenario. Firstly, we present a novel MANO pose parameters regression module from 2D joints directly, which avoids the process of highly nonlinear mapping from abstract image feature.
arXiv Detail & Related papers (2024-03-04T05:11:26Z)
Spatio-temporal MLP-graph network for 3D human pose estimation [8.267311047244881]
Graph convolutional networks and their variants have shown significant promise in 3D human pose estimation. We introduce a new weighted Jacobi feature rule obtained through graph filtering with implicit propagation fairing. We also employ adjacency modulation with the aim of learning meaningful correlations beyond defined between body joints.
arXiv Detail & Related papers (2023-08-29T14:00:55Z)
JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human Mesh Recovery [84.67823511418334]
This paper presents 3D JOint contrastive learning with TRansformers framework for handling occluded 3D human mesh recovery. Our method includes an encoder-decoder transformer architecture to fuse 2D and 3D representations for achieving 2D$&$3D aligned results.
arXiv Detail & Related papers (2023-07-31T02:58:58Z)
Iterative Graph Filtering Network for 3D Human Pose Estimation [5.177947445379688]
Graph convolutional networks (GCNs) have proven to be an effective approach for 3D human pose estimation. In this paper, we introduce an iterative graph filtering framework for 3D human pose estimation. Our approach builds upon the idea of iteratively solving graph filtering with Laplacian regularization.
arXiv Detail & Related papers (2023-07-29T20:46:44Z)
Monocular 3D Reconstruction of Interacting Hands via Collision-Aware Factorized Refinements [96.40125818594952]
We make the first attempt to reconstruct 3D interacting hands from monocular single RGB images. Our method can generate 3D hand meshes with both precise 3D poses and minimal collisions.
arXiv Detail & Related papers (2021-11-01T08:24:10Z)
Learning Dynamical Human-Joint Affinity for 3D Pose Estimation in Videos [47.601288796052714]
Graph Convolution Network (GCN) has been successfully used for 3D human pose estimation in videos. New Dynamical Graph Network (DGNet) can estimate 3D pose by adaptively learning spatial/temporal joint relations from videos.
arXiv Detail & Related papers (2021-09-15T15:06:19Z)
Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images [79.70127290464514]
We decompose the task into two stages, i.e. person localization and pose estimation. And we propose three task-specific graph neural networks for effective message passing. Our approach achieves state-of-the-art performance on CMU Panoptic and Shelf datasets.
arXiv Detail & Related papers (2021-09-13T11:44:07Z)
RGB2Hands: Real-Time Tracking of 3D Hand Interactions from Monocular RGB Video [76.86512780916827]
We present the first real-time method for motion capture of skeletal pose and 3D surface geometry of hands from a single RGB camera. In order to address the inherent depth ambiguities in RGB data, we propose a novel multi-task CNN. We experimentally verify the individual components of our RGB two-hand tracking and 3D reconstruction pipeline.
arXiv Detail & Related papers (2021-06-22T12:53:56Z)
HOPE-Net: A Graph-based Model for Hand-Object Pose Estimation [7.559220068352681]
We propose a lightweight model called HOPE-Net which jointly estimates hand and object pose in 2D and 3D in real-time. Our network uses a cascade of two adaptive graph convolutional neural networks, one to estimate 2D coordinates of the hand joints and object corners, followed by another to convert 2D coordinates to 3D.
arXiv Detail & Related papers (2020-03-31T19:01:42Z)
Fusing Wearable IMUs with Multi-View Images for Human Pose Estimation: A Geometric Approach [76.10879433430466]
We propose to estimate 3D human pose from multi-view images and a few IMUs attached at person's limbs. It operates by firstly detecting 2D poses from the two signals, and then lifting them to the 3D space. The simple two-step approach reduces the error of the state-of-the-art by a large margin on a public dataset.
arXiv Detail & Related papers (2020-03-25T00:26:54Z)
Learning 3D Human Shape and Pose from Dense Body Parts [117.46290013548533]
We propose a Decompose-and-aggregate Network (DaNet) to learn 3D human shape and pose from dense correspondences of body parts. Messages from local streams are aggregated to enhance the robust prediction of the rotation-based poses. Our method is validated on both indoor and real-world datasets including Human3.6M, UP3D, COCO, and 3DPW.
arXiv Detail & Related papers (2019-12-31T15:09:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.