PCIE_EgoHandPose Solution for EgoExo4D Hand Pose Challenge
- URL: http://arxiv.org/abs/2406.12219v1
- Date: Tue, 18 Jun 2024 02:41:32 GMT
- Title: PCIE_EgoHandPose Solution for EgoExo4D Hand Pose Challenge
- Authors: Feng Chen, Ling Ding, Kanokphan Lertniphonphan, Jian Li, Kaer Huang, Zhepeng Wang,
- Abstract summary: The main goal of the challenge is to accurately estimate hand poses, which involve 21 3D joints, using an RGB egocentric video image.
To handle the complexity of the task, we propose the Hand Pose Vision Transformer (HP-ViT)
The HP-ViT comprises a ViT backbone and transformer head to estimate joint positions in 3D, utilizing MPJPE and RLE loss function.
Our approach achieved the 1st position in the Hand Pose challenge with 25.51 MPJPE and 8.49 PA-MPJPE.
- Score: 12.31892993103657
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This report presents our team's 'PCIE_EgoHandPose' solution for the EgoExo4D Hand Pose Challenge at CVPR2024. The main goal of the challenge is to accurately estimate hand poses, which involve 21 3D joints, using an RGB egocentric video image provided for the task. This task is particularly challenging due to the subtle movements and occlusions. To handle the complexity of the task, we propose the Hand Pose Vision Transformer (HP-ViT). The HP-ViT comprises a ViT backbone and transformer head to estimate joint positions in 3D, utilizing MPJPE and RLE loss function. Our approach achieved the 1st position in the Hand Pose challenge with 25.51 MPJPE and 8.49 PA-MPJPE. Code is available at https://github.com/KanokphanL/PCIE_EgoHandPose
Related papers
- PCIE_Pose Solution for EgoExo4D Pose and Proficiency Estimation Challenge [26.194108651583466]
This report focuses on the task of estimating 21 3D hand joints from RGB egocentric videos.<n>We developed the Hand Pose Vision Transformer (HPCIE-T+) to refine hand pose predictions.<n>For the EgoD Body Pose Challenge, we adopted a multimodal syn-temporal feature integration strategy.<n>Our methods achieved remarkable performance: 8.31 PA-MPJPE in the Hand Pose Challenge and 11.25 MPJPE in the Body Pose Challenge.
arXiv Detail & Related papers (2025-05-30T09:51:04Z) - The Invisible EgoHand: 3D Hand Forecasting through EgoBody Pose Estimation [25.320774988055167]
We propose a method to forecast the 3D trajectories and poses of both hands from an egocentric video.
We leverage full-body pose information, allowing other joints to provide constraints on hand motion.
We evaluate EgoH4 on the Ego-Exo4D dataset, combining subsets with body and hand annotations.
arXiv Detail & Related papers (2025-04-11T15:58:31Z) - 1st Place Solution to the 8th HANDS Workshop Challenge -- ARCTIC Track: 3DGS-based Bimanual Category-agnostic Interaction Reconstruction [3.744155289954746]
This report describes our 1st place solution to the 8th HANDS workshop challenge (ARCTIC track) in conjunction with ECCV 2024.
We address the task of bimanual category-agnostic hand-object interaction reconstruction, which aims to generate 3D reconstructions of both hands and the object from a monocular video.
arXiv Detail & Related papers (2024-09-28T02:51:59Z) - PCIE_LAM Solution for Ego4D Looking At Me Challenge [25.029465595146533]
This report presents our solution for the Ego4D Looking At Me Challenge at CVPR2024.
The main goal of the challenge is to accurately determine if a person in the scene is looking at the camera wearer.
Our approach achieved the 1st position in the looking at me challenge with 0.81 mAP and 0.93 accuracy rate.
arXiv Detail & Related papers (2024-06-18T02:16:32Z) - Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects [89.95728475983263]
holistic 3Dunderstanding of such interactions from egocentric views is important for tasks in robotics, AR/VR, action recognition and motion generation.
We design the HANDS23 challenge based on the AssemblyHands and ARCTIC datasets with carefully designed training and testing splits.
Based on the results of the top submitted methods and more recent baselines on the leaderboards, we perform a thorough analysis on 3D hand(-object) reconstruction tasks.
arXiv Detail & Related papers (2024-03-25T05:12:21Z) - Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting [8.134443548271301]
We present EgoTAP, a heatmap-to-3D pose lifting method for highly accurate stereo egocentric 3D pose estimation.
Our method significantly outperforms the previous state-of-the-art qualitatively and quantitatively.
arXiv Detail & Related papers (2024-02-28T13:50:39Z) - Egocentric Whole-Body Motion Capture with FisheyeViT and Diffusion-Based
Motion Refinement [65.08165593201437]
We explore egocentric whole-body motion capture using a single fisheye camera, which simultaneously estimates human body and hand motion.
This task presents significant challenges due to the lack of high-quality datasets, fisheye camera distortion, and human body self-occlusion.
We propose a novel approach that leverages FisheyeViT to extract fisheye image features, which are converted into pixel-aligned 3D heatmap representations for 3D human body pose prediction.
arXiv Detail & Related papers (2023-11-28T07:13:47Z) - 1st Place Solution of Egocentric 3D Hand Pose Estimation Challenge 2023
Technical Report:A Concise Pipeline for Egocentric Hand Pose Reconstruction [11.551318550321938]
Using AssemblyHands, this challenge focuses on egocentric 3D hand pose estimation from a single-view image.
We adopt ViT based backbones and a simple regressor for 3D keypoints prediction, which provides strong model baselines.
Our method achieved 12.21mm MPJPE on test dataset, achieve the first place in Egocentric 3D Hand Pose Estimation challenge.
arXiv Detail & Related papers (2023-10-07T10:25:50Z) - EvHandPose: Event-based 3D Hand Pose Estimation with Sparse Supervision [50.060055525889915]
Event camera shows great potential in 3D hand pose estimation, especially addressing the challenges of fast motion and high dynamic range in a low-power way.
It is challenging to design event representation to encode hand motion information especially when the hands are not moving.
In this paper, we propose EvHandPose with novel hand flow representations in Event-to-Pose module for accurate hand pose estimation.
arXiv Detail & Related papers (2023-03-06T03:27:17Z) - Egocentric Video Task Translation @ Ego4D Challenge 2022 [109.30649877677257]
The EgoTask Translation approach explores relations among a set of egocentric video tasks in the Ego4D challenge.
We propose to leverage existing models developed for other related tasks and design a task that learns to ''translate'' auxiliary task features to the primary task.
Our proposed approach achieves competitive performance on two Ego4D challenges, ranking the 1st in the talking to me challenge and the 3rd in the PNR localization challenge.
arXiv Detail & Related papers (2023-02-03T18:05:49Z) - Building Spatio-temporal Transformers for Egocentric 3D Pose Estimation [9.569752078386006]
We leverage information from past frames to guide our self-attention-based 3D estimation procedure -- Ego-STAN.
Specifically, we build atemporal Transformer model that attends to semantically rich convolutional neural network-based feature maps.
We demonstrate Ego-STAN's superior performance on the xR-EgoPose dataset.
arXiv Detail & Related papers (2022-06-09T22:33:27Z) - Physics-Based Dexterous Manipulations with Estimated Hand Poses and
Residual Reinforcement Learning [52.37106940303246]
We learn a model that maps noisy input hand poses to target virtual poses.
The agent is trained in a residual setting by using a model-free hybrid RL+IL approach.
We test our framework in two applications that use hand pose estimates for dexterous manipulations: hand-object interactions in VR and hand-object motion reconstruction in-the-wild.
arXiv Detail & Related papers (2020-08-07T17:34:28Z) - Measuring Generalisation to Unseen Viewpoints, Articulations, Shapes and
Objects for 3D Hand Pose Estimation under Hand-Object Interaction [137.28465645405655]
HANDS'19 is a challenge to evaluate the abilities of current 3D hand pose estimators (HPEs) to interpolate and extrapolate the poses of a training set.
We show that the accuracy of state-of-the-art methods can drop, and that they fail mostly on poses absent from the training set.
arXiv Detail & Related papers (2020-03-30T19:28:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.