1st Place Solution of Multiview Egocentric Hand Tracking Challenge ECCV2024
- URL: http://arxiv.org/abs/2409.19362v2
- Date: Tue, 8 Oct 2024 08:18:32 GMT
- Title: 1st Place Solution of Multiview Egocentric Hand Tracking Challenge ECCV2024
- Authors: Minqiang Zou, Zhi Lv, Riqiang Jin, Tian Zhan, Mochen Yu, Yao Tang, Jiajun Liang,
- Abstract summary: We present a method that uses multi-view input images and camera parameters to estimate both hand shape and pose.
Our method achieves 13.92mm MPJPE on the Umetrack dataset and 21.66mm MPJPE on the HOT3D dataset.
- Score: 8.462982928029135
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-view egocentric hand tracking is a challenging task and plays a critical role in VR interaction. In this report, we present a method that uses multi-view input images and camera extrinsic parameters to estimate both hand shape and pose. To reduce overfitting to the camera layout, we apply crop jittering and extrinsic parameter noise augmentation. Additionally, we propose an offline neural smoothing post-processing method to further improve the accuracy of hand position and pose. Our method achieves 13.92mm MPJPE on the Umetrack dataset and 21.66mm MPJPE on the HOT3D dataset.
Related papers
- 1st Place Solution of Egocentric 3D Hand Pose Estimation Challenge 2023
Technical Report:A Concise Pipeline for Egocentric Hand Pose Reconstruction [11.551318550321938]
Using AssemblyHands, this challenge focuses on egocentric 3D hand pose estimation from a single-view image.
We adopt ViT based backbones and a simple regressor for 3D keypoints prediction, which provides strong model baselines.
Our method achieved 12.21mm MPJPE on test dataset, achieve the first place in Egocentric 3D Hand Pose Estimation challenge.
arXiv Detail & Related papers (2023-10-07T10:25:50Z) - Multi-Modal Dataset Acquisition for Photometrically Challenging Object [56.30027922063559]
This paper addresses the limitations of current datasets for 3D vision tasks in terms of accuracy, size, realism, and suitable imaging modalities for photometrically challenging objects.
We propose a novel annotation and acquisition pipeline that enhances existing 3D perception and 6D object pose datasets.
arXiv Detail & Related papers (2023-08-21T10:38:32Z) - Enhancing Multi-Camera People Tracking with Anchor-Guided Clustering and
Spatio-Temporal Consistency ID Re-Assignment [22.531044994763487]
We propose a novel multi-camera multiple people tracking method that uses anchor clustering-guided for cross-camera reassigning.
Our approach aims to improve accuracy of tracking by identifying key features that are unique to every individual.
The method has demonstrated robustness and effectiveness in handling both synthetic and real-world data.
arXiv Detail & Related papers (2023-04-19T07:38:15Z) - Towards Nonlinear-Motion-Aware and Occlusion-Robust Rolling Shutter
Correction [54.00007868515432]
Existing methods face challenges in estimating the accurate correction field due to the uniform velocity assumption.
We propose a geometry-based Quadratic Rolling Shutter (QRS) motion solver, which precisely estimates the high-order correction field of individual pixels.
Our method surpasses the state-of-the-art by +4.98, +0.77, and +4.33 of PSNR on Carla-RS, Fastec-RS, and BS-RSC datasets, respectively.
arXiv Detail & Related papers (2023-03-31T15:09:18Z) - Shakes on a Plane: Unsupervised Depth Estimation from Unstabilized
Photography [54.36608424943729]
We show that in a ''long-burst'', forty-two 12-megapixel RAW frames captured in a two-second sequence, there is enough parallax information from natural hand tremor alone to recover high-quality scene depth.
We devise a test-time optimization approach that fits a neural RGB-D representation to long-burst data and simultaneously estimates scene depth and camera motion.
arXiv Detail & Related papers (2022-12-22T18:54:34Z) - Multi-task Learning for Camera Calibration [3.274290296343038]
We present a unique method for predicting intrinsic (principal point offset and focal length) and extrinsic (baseline, pitch, and translation) properties from a pair of images.
By reconstructing the 3D points using a camera model neural network and then using the loss in reconstruction to obtain the camera specifications, this innovative camera projection loss (CPL) method allows us that the desired parameters should be estimated.
arXiv Detail & Related papers (2022-11-22T17:39:31Z) - Transformer-based Global 3D Hand Pose Estimation in Two Hands
Manipulating Objects Scenarios [13.59950629234404]
This report describes our 1st place solution to ECCV 2022 challenge on Human Body, Hands, and Activities (HBHA) from Egocentric and Multi-view Cameras (hand pose estimation)
In this challenge, we aim to estimate global 3D hand poses from the input image where two hands and an object are interacting on the egocentric viewpoint.
Our proposed method performs end-to-end multi-hand pose estimation via transformer architecture.
arXiv Detail & Related papers (2022-10-20T16:24:47Z) - Lightweight Multi-person Total Motion Capture Using Sparse Multi-view
Cameras [35.67288909201899]
We propose a lightweight total motion capture system for multi-person interactive scenarios using only sparse multi-view cameras.
Our method is capable of efficient localization and accurate association of the hands and faces even on severe occluded occasions.
Overall, we propose the first light-weight total capture system and achieves fast, robust and accurate multi-person total motion capture performance.
arXiv Detail & Related papers (2021-08-23T19:23:35Z) - MetaPose: Fast 3D Pose from Multiple Views without 3D Supervision [72.5863451123577]
We show how to train a neural model that can perform accurate 3D pose and camera estimation.
Our method outperforms both classical bundle adjustment and weakly-supervised monocular 3D baselines.
arXiv Detail & Related papers (2021-08-10T18:39:56Z) - DeepMultiCap: Performance Capture of Multiple Characters Using Sparse
Multiview Cameras [63.186486240525554]
DeepMultiCap is a novel method for multi-person performance capture using sparse multi-view cameras.
Our method can capture time varying surface details without the need of using pre-scanned template models.
arXiv Detail & Related papers (2021-05-01T14:32:13Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.