1st Place Solution of Egocentric 3D Hand Pose Estimation Challenge 2023
Technical Report:A Concise Pipeline for Egocentric Hand Pose Reconstruction
- URL: http://arxiv.org/abs/2310.04769v2
- Date: Tue, 10 Oct 2023 03:48:32 GMT
- Title: 1st Place Solution of Egocentric 3D Hand Pose Estimation Challenge 2023
Technical Report:A Concise Pipeline for Egocentric Hand Pose Reconstruction
- Authors: Zhishan Zhou, Zhi Lv, Shihao Zhou, Minqiang Zou, Tong Wu, Mochen Yu,
Yao Tang, Jiajun Liang
- Abstract summary: Using AssemblyHands, this challenge focuses on egocentric 3D hand pose estimation from a single-view image.
We adopt ViT based backbones and a simple regressor for 3D keypoints prediction, which provides strong model baselines.
Our method achieved 12.21mm MPJPE on test dataset, achieve the first place in Egocentric 3D Hand Pose Estimation challenge.
- Score: 11.551318550321938
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This report introduce our work on Egocentric 3D Hand Pose Estimation
workshop. Using AssemblyHands, this challenge focuses on egocentric 3D hand
pose estimation from a single-view image. In the competition, we adopt ViT
based backbones and a simple regressor for 3D keypoints prediction, which
provides strong model baselines. We noticed that Hand-objects occlusions and
self-occlusions lead to performance degradation, thus proposed a non-model
method to merge multi-view results in the post-process stage. Moreover, We
utilized test time augmentation and model ensemble to make further improvement.
We also found that public dataset and rational preprocess are beneficial. Our
method achieved 12.21mm MPJPE on test dataset, achieve the first place in
Egocentric 3D Hand Pose Estimation challenge.
Related papers
- HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud [60.47544798202017]
Hand pose estimation is a critical task in various human-computer interaction applications.
This paper proposes HandDiff, a diffusion-based hand pose estimation model that iteratively denoises accurate hand pose conditioned on hand-shaped image-point clouds.
Experimental results demonstrate that the proposed HandDiff significantly outperforms the existing approaches on four challenging hand pose benchmark datasets.
arXiv Detail & Related papers (2024-04-04T02:15:16Z) - AssemblyHands: Towards Egocentric Activity Understanding via 3D Hand
Pose Estimation [26.261767086366866]
We present AssemblyHands, a large-scale benchmark dataset with accurate 3D hand pose annotations.
AssemblyHands provides 3.0M annotated images, including 490K egocentric images.
Our study shows that having higher-quality hand poses directly improves the ability to recognize actions.
arXiv Detail & Related papers (2023-04-24T17:52:57Z) - 3D Interacting Hand Pose Estimation by Hand De-occlusion and Removal [85.30756038989057]
Estimating 3D interacting hand pose from a single RGB image is essential for understanding human actions.
We propose to decompose the challenging interacting hand pose estimation task and estimate the pose of each hand separately.
Experiments show that the proposed method significantly outperforms previous state-of-the-art interacting hand pose estimation approaches.
arXiv Detail & Related papers (2022-07-22T13:04:06Z) - TriHorn-Net: A Model for Accurate Depth-Based 3D Hand Pose Estimation [8.946655323517092]
TriHorn-Net is a novel model that uses specific innovations to improve hand pose estimation accuracy on depth images.
The first innovation is the decomposition of the 3D hand pose estimation into the estimation of 2D joint locations in the depth image space.
The second innovation is PixDropout, which is, to the best of our knowledge, the first appearance-based data augmentation method for hand depth images.
arXiv Detail & Related papers (2022-06-14T19:08:42Z) - Building Spatio-temporal Transformers for Egocentric 3D Pose Estimation [9.569752078386006]
We leverage information from past frames to guide our self-attention-based 3D estimation procedure -- Ego-STAN.
Specifically, we build atemporal Transformer model that attends to semantically rich convolutional neural network-based feature maps.
We demonstrate Ego-STAN's superior performance on the xR-EgoPose dataset.
arXiv Detail & Related papers (2022-06-09T22:33:27Z) - Efficient Virtual View Selection for 3D Hand Pose Estimation [50.93751374572656]
We propose a new virtual view selection and fusion module for 3D hand pose estimation from single depth.
Our proposed virtual view selection and fusion module is both effective for 3D hand pose estimation.
arXiv Detail & Related papers (2022-03-29T11:57:53Z) - Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose
Estimation [70.32536356351706]
We introduce MRP-Net that constitutes a common deep network backbone with two output heads subscribing to two diverse configurations.
We derive suitable measures to quantify prediction uncertainty at both pose and joint level.
We present a comprehensive evaluation of the proposed approach and demonstrate state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2022-03-29T07:14:58Z) - Distribution-Aware Single-Stage Models for Multi-Person 3D Pose
Estimation [29.430404703883084]
We present a novel Distribution-Aware Single-stage (DAS) model for tackling the challenging multi-person 3D pose estimation problem.
The proposed DAS model simultaneously localizes person positions and their corresponding body joints in the 3D camera space in a one-pass manner.
Comprehensive experiments on benchmarks CMU Panoptic and MuPoTS-3D demonstrate the superior efficiency of the proposed DAS model.
arXiv Detail & Related papers (2022-03-15T07:30:27Z) - Synthetic Training for Monocular Human Mesh Recovery [100.38109761268639]
This paper aims to estimate 3D mesh of multiple body parts with large-scale differences from a single RGB image.
The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images.
We propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants.
arXiv Detail & Related papers (2020-10-27T03:31:35Z) - Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image
Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames.
Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.