Related papers: Towards Egocentric 3D Hand Pose Estimation in Unseen Domains

Towards Egocentric 3D Hand Pose Estimation in Unseen Domains

URL: http://arxiv.org/abs/2601.06537v1
Date: Sat, 10 Jan 2026 11:36:01 GMT
Title: Towards Egocentric 3D Hand Pose Estimation in Unseen Domains
Authors: Wiktor Mucha, Michael Wray, Martin Kampel,
Abstract summary: V-HPOT is a novel approach for improving the cross-domain performance of 3D hand pose estimation.<n>Method addresses this by estimating keypoint z-coordinates in a virtual camera space.<n>Self-supervised test-time optimisation strategy refines the model's depth perception during inference.
Score: 12.514577938360574
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We present V-HPOT, a novel approach for improving the cross-domain performance of 3D hand pose estimation from egocentric images across diverse, unseen domains. State-of-the-art methods demonstrate strong performance when trained and tested within the same domain. However, they struggle to generalise to new environments due to limited training data and depth perception -- overfitting to specific camera intrinsics. Our method addresses this by estimating keypoint z-coordinates in a virtual camera space, normalised by focal length and image size, enabling camera-agnostic depth prediction. We further leverage this invariance to camera intrinsics to propose a self-supervised test-time optimisation strategy that refines the model's depth perception during inference. This is achieved by applying a 3D consistency loss between predicted and in-space scale-transformed hand poses, allowing the model to adapt to target domain characteristics without requiring ground truth annotations. V-HPOT significantly improves 3D hand pose estimation performance in cross-domain scenarios, achieving a 71% reduction in mean pose error on the H2O dataset and a 41% reduction on the AssemblyHands dataset. Compared to state-of-the-art methods, V-HPOT outperforms all single-stage approaches across all datasets and competes closely with two-stage methods, despite needing approximately x3.5 to x14 less data.

Related papers

Direction-Aware Hybrid Representation Learning for 3D Hand Pose and Shape Estimation [41.96019347138128]
We propose learning direction-aware hybrid features (DaHyF) that fuse implicit image features and explicit 2D joint coordinate features.<n>Our method directly predicts 3D hand poses with DaHyF representation and reduces jittering during motion capture using prediction confidence based on contrastive learning.
arXiv Detail & Related papers (2025-04-02T02:06:23Z)
3D Equivariant Pose Regression via Direct Wigner-D Harmonics Prediction [50.07071392673984]
Existing methods learn 3D rotations parametrized in the spatial domain using angles or quaternions. We propose a frequency-domain approach that directly predicts Wigner-D coefficients for 3D rotation regression. Our method achieves state-of-the-art results on benchmarks such as ModelNet10-SO(3) and PASCAL3D+.
arXiv Detail & Related papers (2024-11-01T12:50:38Z)
HandDGP: Camera-Space Hand Mesh Prediction with Differentiable Global Positioning [1.4515751892711464]
We propose an end-to-end solution that addresses the 2D-3D correspondence problem. This solution enables back-propagation from camera space outputs to the rest of the network through a new differentiable global positioning module. We validate the effectiveness of our framework in evaluations against several baselines and state-of-the-art approaches.
arXiv Detail & Related papers (2024-07-22T17:59:01Z)
UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues [55.69339788566899]
UPose3D is a novel approach for multi-view 3D human pose estimation. It improves robustness and flexibility without requiring direct 3D annotations.
arXiv Detail & Related papers (2024-04-23T00:18:00Z)
Cameras as Rays: Pose Estimation via Ray Diffusion [54.098613859015856]
Estimating camera poses is a fundamental task for 3D reconstruction and remains challenging given sparsely sampled views. We propose a distributed representation of camera pose that treats a camera as a bundle of rays. Our proposed methods, both regression- and diffusion-based, demonstrate state-of-the-art performance on camera pose estimation on CO3D.
arXiv Detail & Related papers (2024-02-22T18:59:56Z)
Towards Domain Generalization for Multi-view 3D Object Detection in Bird-Eye-View [11.958753088613637]
We first analyze the causes of the domain gap for the MV3D-Det task. To acquire a robust depth prediction, we propose to decouple the depth estimation from intrinsic parameters of the camera. We modify the focal length values to create multiple pseudo-domains and construct an adversarial training loss to encourage the feature representation to be more domain-agnostic.
arXiv Detail & Related papers (2023-03-03T02:59:13Z)
TriHorn-Net: A Model for Accurate Depth-Based 3D Hand Pose Estimation [8.946655323517092]
TriHorn-Net is a novel model that uses specific innovations to improve hand pose estimation accuracy on depth images. The first innovation is the decomposition of the 3D hand pose estimation into the estimation of 2D joint locations in the depth image space. The second innovation is PixDropout, which is, to the best of our knowledge, the first appearance-based data augmentation method for hand depth images.
arXiv Detail & Related papers (2022-06-14T19:08:42Z)
Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose Estimation [70.32536356351706]
We introduce MRP-Net that constitutes a common deep network backbone with two output heads subscribing to two diverse configurations. We derive suitable measures to quantify prediction uncertainty at both pose and joint level. We present a comprehensive evaluation of the proposed approach and demonstrate state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2022-03-29T07:14:58Z)
Uncertainty-Aware Camera Pose Estimation from Points and Lines [101.03675842534415]
Perspective-n-Point-and-Line (Pn$PL) aims at fast, accurate and robust camera localizations with respect to a 3D model from 2D-3D feature coordinates.
arXiv Detail & Related papers (2021-07-08T15:19:36Z)
Wide-Baseline Relative Camera Pose Estimation with Directional Learning [46.21836501895394]
We introduce DirectionNet, which estimates discrete distributions over the 5D relative pose space using a novel parameterization to make the estimation problem tractable. We evaluate our model on challenging synthetic and real pose estimation datasets constructed from Matterport3D and InteriorNet.
arXiv Detail & Related papers (2021-06-07T04:46:09Z)
Measuring Generalisation to Unseen Viewpoints, Articulations, Shapes and Objects for 3D Hand Pose Estimation under Hand-Object Interaction [137.28465645405655]
HANDS'19 is a challenge to evaluate the abilities of current 3D hand pose estimators (HPEs) to interpolate and extrapolate the poses of a training set. We show that the accuracy of state-of-the-art methods can drop, and that they fail mostly on poses absent from the training set.
arXiv Detail & Related papers (2020-03-30T19:28:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.