Deep Bingham Networks: Dealing with Uncertainty and Ambiguity in Pose
Estimation
- URL: http://arxiv.org/abs/2012.11002v1
- Date: Sun, 20 Dec 2020 19:20:26 GMT
- Title: Deep Bingham Networks: Dealing with Uncertainty and Ambiguity in Pose
Estimation
- Authors: Haowen Deng, Mai Bui, Nassir Navab, Leonidas Guibas, Slobodan Ilic,
Tolga Birdal
- Abstract summary: Deep Bingham Networks (DBN) can handle pose-related uncertainties and ambiguities arising in almost all real life applications concerning 3D data.
DBN extends the state of the art direct pose regression networks by (i) a multi-hypotheses prediction head which can yield different distribution modes.
We propose new training strategies so as to avoid mode or posterior collapse during training and to improve numerical stability.
- Score: 74.76155168705975
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we introduce Deep Bingham Networks (DBN), a generic framework
that can naturally handle pose-related uncertainties and ambiguities arising in
almost all real life applications concerning 3D data. While existing works
strive to find a single solution to the pose estimation problem, we make peace
with the ambiguities causing high uncertainty around which solutions to
identify as the best. Instead, we report a family of poses which capture the
nature of the solution space. DBN extends the state of the art direct pose
regression networks by (i) a multi-hypotheses prediction head which can yield
different distribution modes; and (ii) novel loss functions that benefit from
Bingham distributions on rotations. This way, DBN can work both in unambiguous
cases providing uncertainty information, and in ambiguous scenes where an
uncertainty per mode is desired. On a technical front, our network regresses
continuous Bingham mixture models and is applicable to both 2D data such as
images and to 3D data such as point clouds. We proposed new training strategies
so as to avoid mode or posterior collapse during training and to improve
numerical stability. Our methods are thoroughly tested on two different
applications exploiting two different modalities: (i) 6D camera relocalization
from images; and (ii) object pose estimation from 3D point clouds,
demonstrating decent advantages over the state of the art. For the former we
contributed our own dataset composed of five indoor scenes where it is
unavoidable to capture images corresponding to views that are hard to uniquely
identify. For the latter we achieve the top results especially for symmetric
objects of ModelNet dataset.
Related papers
- ADen: Adaptive Density Representations for Sparse-view Camera Pose Estimation [17.097170273209333]
Recovering camera poses from a set of images is a foundational task in 3D computer vision.
Recent data-driven approaches aim to directly output camera poses, either through regressing the 6DoF camera poses or formulating rotation as a probability distribution.
We propose ADen to unify the two frameworks by employing a generator and a discriminator.
arXiv Detail & Related papers (2024-08-16T22:45:46Z) - DVMNet: Computing Relative Pose for Unseen Objects Beyond Hypotheses [59.51874686414509]
Current approaches approximate the continuous pose representation with a large number of discrete pose hypotheses.
We present a Deep Voxel Matching Network (DVMNet) that eliminates the need for pose hypotheses and computes the relative object pose in a single pass.
Our method delivers more accurate relative pose estimates for novel objects at a lower computational cost compared to state-of-the-art methods.
arXiv Detail & Related papers (2024-03-20T15:41:32Z) - FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models [67.96827539201071]
We propose a novel test-time optimization approach for 3D scene reconstruction.
Our method achieves state-of-the-art cross-dataset reconstruction on five zero-shot testing datasets.
arXiv Detail & Related papers (2023-08-10T17:55:02Z) - Ambiguity-Aware Multi-Object Pose Optimization for Visually-Assisted
Robot Manipulation [17.440729138126162]
We present an ambiguity-aware 6D object pose estimation network, PrimA6D++, as a generic uncertainty prediction method.
The proposed method shows a significant performance improvement in T-LESS and YCB-Video datasets.
We further demonstrate real-time scene recognition capability for visually-assisted robot manipulation.
arXiv Detail & Related papers (2022-11-02T08:57:20Z) - Non-Local Latent Relation Distillation for Self-Adaptive 3D Human Pose
Estimation [63.199549837604444]
3D human pose estimation approaches leverage different forms of strong (2D/3D pose) or weak (multi-view or depth) paired supervision.
We cast 3D pose learning as a self-supervised adaptation problem that aims to transfer the task knowledge from a labeled source domain to a completely unpaired target.
We evaluate different self-adaptation settings and demonstrate state-of-the-art 3D human pose estimation performance on standard benchmarks.
arXiv Detail & Related papers (2022-04-05T03:52:57Z) - Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose
Estimation [70.32536356351706]
We introduce MRP-Net that constitutes a common deep network backbone with two output heads subscribing to two diverse configurations.
We derive suitable measures to quantify prediction uncertainty at both pose and joint level.
We present a comprehensive evaluation of the proposed approach and demonstrate state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2022-03-29T07:14:58Z) - RNNPose: Recurrent 6-DoF Object Pose Refinement with Robust
Correspondence Field Estimation and Pose Optimization [46.144194562841435]
We propose a framework based on a recurrent neural network (RNN) for object pose refinement.
The problem is formulated as a non-linear least squares problem based on the estimated correspondence field.
The correspondence field estimation and pose refinement are conducted alternatively in each iteration to recover accurate object poses.
arXiv Detail & Related papers (2022-03-24T06:24:55Z) - 6D Camera Relocalization in Ambiguous Scenes via Continuous Multimodal
Inference [67.70859730448473]
We present a multimodal camera relocalization framework that captures ambiguities and uncertainties.
We predict multiple camera pose hypotheses as well as the respective uncertainty for each prediction.
We introduce a new dataset specifically designed to foster camera localization research in ambiguous environments.
arXiv Detail & Related papers (2020-04-09T20:55:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.