DGGAN: Depth-image Guided Generative Adversarial Networks for
Disentangling RGB and Depth Images in 3D Hand Pose Estimation
- URL: http://arxiv.org/abs/2012.03197v1
- Date: Sun, 6 Dec 2020 07:23:21 GMT
- Title: DGGAN: Depth-image Guided Generative Adversarial Networks for
Disentangling RGB and Depth Images in 3D Hand Pose Estimation
- Authors: Liangjian Chen, Shih-Yao Lin, Yusheng Xie, Yen-Yu Lin, Wei Fan, and
Xiaohui Xie
- Abstract summary: Estimating 3D hand poses from RGB images is essential to a wide range of potential applications, but is challengingowing to substantial ambiguity in the inference of depth in-formation from RGB images.
We propose a conditional generative adversarial network (GAN) model,called Depth-image Guided GAN (DGGAN), to generate re-alistic depth maps conditioned on the input RGB image.
Experimental results on multiplebenchmark datasets show that the synthesized depth mapsproduced by DGGAN are quite effective in regularizing thepose estimation model.
- Score: 33.23818997206978
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Estimating3D hand poses from RGB images is essentialto a wide range of
potential applications, but is challengingowing to substantial ambiguity in the
inference of depth in-formation from RGB images. State-of-the-art estimators
ad-dress this problem by regularizing3D hand pose estimationmodels during
training to enforce the consistency betweenthe predicted3D poses and the
ground-truth depth maps.However, these estimators rely on both RGB images and
thepaired depth maps during training. In this study, we proposea conditional
generative adversarial network (GAN) model,called Depth-image Guided GAN
(DGGAN), to generate re-alistic depth maps conditioned on the input RGB image,
anduse the synthesized depth maps to regularize the3D handpose estimation
model, therefore eliminating the need forground-truth depth maps. Experimental
results on multiplebenchmark datasets show that the synthesized depth
mapsproduced by DGGAN are quite effective in regularizing thepose estimation
model, yielding new state-of-the-art resultsin estimation accuracy, notably
reducing the mean3D end-point errors (EPE) by4.7%,16.5%, and6.8%on the RHD,STB
and MHP datasets, respectively.
Related papers
- Towards Human-Level 3D Relative Pose Estimation: Generalizable, Training-Free, with Single Reference [62.99706119370521]
Humans can easily deduce the relative pose of an unseen object, without label/training, given only a single query-reference image pair.
We propose a novel 3D generalizable relative pose estimation method by elaborating (i) with a 2.5D shape from an RGB-D reference, (ii) with an off-the-shelf differentiable, and (iii) with semantic cues from a pretrained model like DINOv2.
arXiv Detail & Related papers (2024-06-26T16:01:10Z) - Improving 2D-3D Dense Correspondences with Diffusion Models for 6D
Object Pose Estimation [9.760487761422326]
Estimating 2D-3D correspondences between RGB images and 3D space is a fundamental problem in 6D object pose estimation.
Recent pose estimators use dense correspondence maps and Point-to-Point algorithms to estimate object poses.
Recent advancements in image-to-image translation have led to diffusion models being the superior choice when evaluated on benchmarking datasets.
arXiv Detail & Related papers (2024-02-09T14:27:40Z) - FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models [67.96827539201071]
We propose a novel test-time optimization approach for 3D scene reconstruction.
Our method achieves state-of-the-art cross-dataset reconstruction on five zero-shot testing datasets.
arXiv Detail & Related papers (2023-08-10T17:55:02Z) - 3D Neural Embedding Likelihood: Probabilistic Inverse Graphics for
Robust 6D Pose Estimation [50.15926681475939]
Inverse graphics aims to infer the 3D scene structure from 2D images.
We introduce probabilistic modeling to quantify uncertainty and achieve robustness in 6D pose estimation tasks.
3DNEL effectively combines learned neural embeddings from RGB with depth information to improve robustness in sim-to-real 6D object pose estimation from RGB-D images.
arXiv Detail & Related papers (2023-02-07T20:48:35Z) - Boosting Monocular 3D Object Detection with Object-Centric Auxiliary
Depth Supervision [13.593246617391266]
We propose a method to boost the RGB image-based 3D detector by jointly training the detection network with a depth prediction loss analogous to the depth estimation task.
Our novel object-centric depth prediction loss focuses on depth around foreground objects, which is important for 3D object detection.
Our depth regression model is further trained to predict the uncertainty of depth to represent the 3D confidence of objects.
arXiv Detail & Related papers (2022-10-29T11:32:28Z) - DPODv2: Dense Correspondence-Based 6 DoF Pose Estimation [24.770767430749288]
We propose a 3 stage 6 DoF object detection method called DPODv2 (Dense Pose Object Detector)
We combine a 2D object detector with a dense correspondence estimation network and a multi-view pose refinement method to estimate a full 6 DoF pose.
DPODv2 achieves excellent results on all of them while still remaining fast and scalable independent of the used data modality and the type of training data.
arXiv Detail & Related papers (2022-07-06T16:48:56Z) - Semi-Perspective Decoupled Heatmaps for 3D Robot Pose Estimation from
Depth Maps [66.24554680709417]
Knowing the exact 3D location of workers and robots in a collaborative environment enables several real applications.
We propose a non-invasive framework based on depth devices and deep neural networks to estimate the 3D pose of robots from an external camera.
arXiv Detail & Related papers (2022-07-06T08:52:12Z) - TriHorn-Net: A Model for Accurate Depth-Based 3D Hand Pose Estimation [8.946655323517092]
TriHorn-Net is a novel model that uses specific innovations to improve hand pose estimation accuracy on depth images.
The first innovation is the decomposition of the 3D hand pose estimation into the estimation of 2D joint locations in the depth image space.
The second innovation is PixDropout, which is, to the best of our knowledge, the first appearance-based data augmentation method for hand depth images.
arXiv Detail & Related papers (2022-06-14T19:08:42Z) - Weakly-Supervised Monocular Depth Estimationwith Resolution-Mismatched
Data [73.9872931307401]
We propose a novel weakly-supervised framework to train a monocular depth estimation network.
The proposed framework is composed of a sharing weight monocular depth estimation network and a depth reconstruction network for distillation.
Experimental results demonstrate that our method achieves superior performance than unsupervised and semi-supervised learning based schemes.
arXiv Detail & Related papers (2021-09-23T18:04:12Z) - VR3Dense: Voxel Representation Learning for 3D Object Detection and
Monocular Dense Depth Reconstruction [0.951828574518325]
We introduce a method for jointly training 3D object detection and monocular dense depth reconstruction neural networks.
It takes as inputs, a LiDAR point-cloud, and a single RGB image during inference and produces object pose predictions as well as a densely reconstructed depth map.
While our object detection is trained in a supervised manner, the depth prediction network is trained with both self-supervised and supervised loss functions.
arXiv Detail & Related papers (2021-04-13T04:25:54Z) - 3D Dense Geometry-Guided Facial Expression Synthesis by Adversarial
Learning [54.24887282693925]
We propose a novel framework to exploit 3D dense (depth and surface normals) information for expression manipulation.
We use an off-the-shelf state-of-the-art 3D reconstruction model to estimate the depth and create a large-scale RGB-Depth dataset.
Our experiments demonstrate that the proposed method outperforms the competitive baseline and existing arts by a large margin.
arXiv Detail & Related papers (2020-09-30T17:12:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.