Weakly-Supervised 3D Hand Reconstruction with Knowledge Prior and Uncertainty Guidance
- URL: http://arxiv.org/abs/2407.12307v1
- Date: Wed, 17 Jul 2024 04:05:34 GMT
- Title: Weakly-Supervised 3D Hand Reconstruction with Knowledge Prior and Uncertainty Guidance
- Authors: Yufei Zhang, Jeffrey O. Kephart, Qiang Ji,
- Abstract summary: Fully-supervised monocular 3D hand reconstruction is often difficult because capturing the requisite 3D data entails deploying specialized equipment in a controlled environment.
We introduce a weakly-supervised method that avoids such requirements by leveraging fundamental principles well-established in the understanding of the human hand's unique structure and functionality.
Our method achieves nearly a 21% performance improvement on the widely adopted FreiHAND dataset.
- Score: 27.175214956244798
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fully-supervised monocular 3D hand reconstruction is often difficult because capturing the requisite 3D data entails deploying specialized equipment in a controlled environment. We introduce a weakly-supervised method that avoids such requirements by leveraging fundamental principles well-established in the understanding of the human hand's unique structure and functionality. Specifically, we systematically study hand knowledge from different sources, including biomechanics, functional anatomy, and physics. We effectively incorporate these valuable foundational insights into 3D hand reconstruction models through an appropriate set of differentiable training losses. This enables training solely with readily-obtainable 2D hand landmark annotations and eliminates the need for expensive 3D supervision. Moreover, we explicitly model the uncertainty that is inherent in image observations. We enhance the training process by exploiting a simple yet effective Negative Log Likelihood (NLL) loss that incorporates uncertainty into the loss function. Through extensive experiments, we demonstrate that our method significantly outperforms state-of-the-art weakly-supervised methods. For example, our method achieves nearly a 21\% performance improvement on the widely adopted FreiHAND dataset.
Related papers
- FILP-3D: Enhancing 3D Few-shot Class-incremental Learning with
Pre-trained Vision-Language Models [62.663113296987085]
Few-shot class-incremental learning aims to mitigate the catastrophic forgetting issue when a model is incrementally trained on limited data.
We introduce two novel components: the Redundant Feature Eliminator (RFE) and the Spatial Noise Compensator (SNC)
Considering the imbalance in existing 3D datasets, we also propose new evaluation metrics that offer a more nuanced assessment of a 3D FSCIL model.
arXiv Detail & Related papers (2023-12-28T14:52:07Z) - ODM3D: Alleviating Foreground Sparsity for Semi-Supervised Monocular 3D
Object Detection [15.204935788297226]
ODM3D framework entails cross-modal knowledge distillation at various levels to inject LiDAR-domain knowledge into a monocular detector during training.
By identifying foreground sparsity as the main culprit behind existing methods' suboptimal training, we exploit the precise localisation information embedded in LiDAR points.
Our method ranks 1st in both KITTI validation and test benchmarks, significantly surpassing all existing monocular methods, supervised or semi-supervised.
arXiv Detail & Related papers (2023-10-28T07:12:09Z) - AdvMono3D: Advanced Monocular 3D Object Detection with Depth-Aware
Robust Adversarial Training [64.14759275211115]
We propose a depth-aware robust adversarial training method for monocular 3D object detection, dubbed DART3D.
Our adversarial training approach capitalizes on the inherent uncertainty, enabling the model to significantly improve its robustness against adversarial attacks.
arXiv Detail & Related papers (2023-09-03T07:05:32Z) - Efficient Annotation and Learning for 3D Hand Pose Estimation: A Survey [23.113633046349314]
3D hand pose estimation has potential to enable various applications, such as video understanding, AR/VR, and robotics.
However, the performance of models is tied to the quality and quantity of annotated 3D hand poses.
We examine methods for learning 3D hand poses when annotated data are scarce, including self-supervised pretraining, semi-supervised learning, and domain adaptation.
arXiv Detail & Related papers (2022-06-05T20:18:52Z) - Towards unconstrained joint hand-object reconstruction from RGB videos [81.97694449736414]
Reconstructing hand-object manipulations holds a great potential for robotics and learning from human demonstrations.
We first propose a learning-free fitting approach for hand-object reconstruction which can seamlessly handle two-hand object interactions.
arXiv Detail & Related papers (2021-08-16T12:26:34Z) - Self Context and Shape Prior for Sensorless Freehand 3D Ultrasound
Reconstruction [61.62191904755521]
3D freehand US reconstruction is promising in addressing the problem by providing broad range and freeform scan.
Existing deep learning based methods only focus on the basic cases of skill sequences.
We propose a novel approach to sensorless freehand 3D US reconstruction considering the complex skill sequences.
arXiv Detail & Related papers (2021-07-31T16:06:50Z) - Model-based 3D Hand Reconstruction via Self-Supervised Learning [72.0817813032385]
Reconstructing a 3D hand from a single-view RGB image is challenging due to various hand configurations and depth ambiguity.
We propose S2HAND, a self-supervised 3D hand reconstruction network that can jointly estimate pose, shape, texture, and the camera viewpoint.
For the first time, we demonstrate the feasibility of training an accurate 3D hand reconstruction network without relying on manual annotations.
arXiv Detail & Related papers (2021-03-22T10:12:43Z) - Neural Descent for Visual 3D Human Pose and Shape [67.01050349629053]
We present deep neural network methodology to reconstruct the 3d pose and shape of people, given an input RGB image.
We rely on a recently introduced, expressivefull body statistical 3d human model, GHUM, trained end-to-end.
Central to our methodology, is a learning to learn and optimize approach, referred to as HUmanNeural Descent (HUND), which avoids both second-order differentiation.
arXiv Detail & Related papers (2020-08-16T13:38:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.