DIREG3D: DIrectly REGress 3D Hands from Multiple Cameras
- URL: http://arxiv.org/abs/2201.11187v1
- Date: Wed, 26 Jan 2022 21:03:56 GMT
- Title: DIREG3D: DIrectly REGress 3D Hands from Multiple Cameras
- Authors: Ashar Ali, Upal Mahbub, Gokce Dane, Gerhard Reitmayr
- Abstract summary: DIREG3D is capable of utilizing camera parameters, 3D geometry, intermediate 2D cues, and visual information to regress parameters for accurately representing a Hand Mesh model.
We extend these results to a multi-view camera setup by fusing features from different viewpoints.
- Score: 0.22940141855172028
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we present DIREG3D, a holistic framework for 3D Hand Tracking.
The proposed framework is capable of utilizing camera intrinsic parameters, 3D
geometry, intermediate 2D cues, and visual information to regress parameters
for accurately representing a Hand Mesh model. Our experiments show that
information like the size of the 2D hand, its distance from the optical center,
and radial distortion is useful for deriving highly reliable 3D poses in camera
space from just monocular information. Furthermore, we extend these results to
a multi-view camera setup by fusing features from different viewpoints.
Related papers
- SYM3D: Learning Symmetric Triplanes for Better 3D-Awareness of GANs [5.84660008137615]
SYM3D is a novel 3D-aware GAN designed to leverage the prevalental symmetry structure found in natural and man-made objects.
We demonstrate its superior performance in capturing detailed geometry and texture, even when trained on only single-view images.
arXiv Detail & Related papers (2024-06-10T16:24:07Z) - Self-learning Canonical Space for Multi-view 3D Human Pose Estimation [57.969696744428475]
Multi-view 3D human pose estimation is naturally superior to single view one.
The accurate annotation of these information is hard to obtain.
We propose a fully self-supervised framework, named cascaded multi-view aggregating network (CMANet)
CMANet is superior to state-of-the-art methods in extensive quantitative and qualitative analysis.
arXiv Detail & Related papers (2024-03-19T04:54:59Z) - MagicDrive: Street View Generation with Diverse 3D Geometry Control [82.69871576797166]
We introduce MagicDrive, a novel street view generation framework, offering diverse 3D geometry controls.
Our design incorporates a cross-view attention module, ensuring consistency across multiple camera views.
arXiv Detail & Related papers (2023-10-04T06:14:06Z) - Multi-Modal Dataset Acquisition for Photometrically Challenging Object [56.30027922063559]
This paper addresses the limitations of current datasets for 3D vision tasks in terms of accuracy, size, realism, and suitable imaging modalities for photometrically challenging objects.
We propose a novel annotation and acquisition pipeline that enhances existing 3D perception and 6D object pose datasets.
arXiv Detail & Related papers (2023-08-21T10:38:32Z) - Ray3D: ray-based 3D human pose estimation for monocular absolute 3D
localization [3.5379706873065917]
We propose a novel monocular ray-based 3D (Ray3D) absolute human pose estimation with calibrated camera.
Our method significantly outperforms existing state-of-the-art models.
arXiv Detail & Related papers (2022-03-22T05:42:31Z) - MetaPose: Fast 3D Pose from Multiple Views without 3D Supervision [72.5863451123577]
We show how to train a neural model that can perform accurate 3D pose and camera estimation.
Our method outperforms both classical bundle adjustment and weakly-supervised monocular 3D baselines.
arXiv Detail & Related papers (2021-08-10T18:39:56Z) - MM-Hand: 3D-Aware Multi-Modal Guided Hand Generative Network for 3D Hand
Pose Synthesis [81.40640219844197]
Estimating the 3D hand pose from a monocular RGB image is important but challenging.
A solution is training on large-scale RGB hand images with accurate 3D hand keypoint annotations.
We have developed a learning-based approach to synthesize realistic, diverse, and 3D pose-preserving hand images.
arXiv Detail & Related papers (2020-10-02T18:27:34Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.