6D Camera Relocalization in Ambiguous Scenes via Continuous Multimodal
Inference
- URL: http://arxiv.org/abs/2004.04807v2
- Date: Thu, 16 Jul 2020 07:06:27 GMT
- Title: 6D Camera Relocalization in Ambiguous Scenes via Continuous Multimodal
Inference
- Authors: Mai Bui and Tolga Birdal and Haowen Deng and Shadi Albarqouni and
Leonidas Guibas and Slobodan Ilic and Nassir Navab
- Abstract summary: We present a multimodal camera relocalization framework that captures ambiguities and uncertainties.
We predict multiple camera pose hypotheses as well as the respective uncertainty for each prediction.
We introduce a new dataset specifically designed to foster camera localization research in ambiguous environments.
- Score: 67.70859730448473
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a multimodal camera relocalization framework that captures
ambiguities and uncertainties with continuous mixture models defined on the
manifold of camera poses. In highly ambiguous environments, which can easily
arise due to symmetries and repetitive structures in the scene, computing one
plausible solution (what most state-of-the-art methods currently regress) may
not be sufficient. Instead we predict multiple camera pose hypotheses as well
as the respective uncertainty for each prediction. Towards this aim, we use
Bingham distributions, to model the orientation of the camera pose, and a
multivariate Gaussian to model the position, with an end-to-end deep neural
network. By incorporating a Winner-Takes-All training scheme, we finally obtain
a mixture model that is well suited for explaining ambiguities in the scene,
yet does not suffer from mode collapse, a common problem with mixture density
networks. We introduce a new dataset specifically designed to foster camera
localization research in ambiguous environments and exhaustively evaluate our
method on synthetic as well as real data on both ambiguous scenes and on
non-ambiguous benchmark datasets. We plan to release our code and dataset under
$\href{https://multimodal3dvision.github.io}{multimodal3dvision.github.io}$.
Related papers
- No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images [100.80376573969045]
NoPoSplat is a feed-forward model capable of reconstructing 3D scenes parameterized by 3D Gaussians from multi-view images.
Our model achieves real-time 3D Gaussian reconstruction during inference.
This work makes significant advances in pose-free generalizable 3D reconstruction and demonstrates its applicability to real-world scenarios.
arXiv Detail & Related papers (2024-10-31T17:58:22Z) - ADen: Adaptive Density Representations for Sparse-view Camera Pose Estimation [17.097170273209333]
Recovering camera poses from a set of images is a foundational task in 3D computer vision.
Recent data-driven approaches aim to directly output camera poses, either through regressing the 6DoF camera poses or formulating rotation as a probability distribution.
We propose ADen to unify the two frameworks by employing a generator and a discriminator.
arXiv Detail & Related papers (2024-08-16T22:45:46Z) - Single Image Depth Prediction Made Better: A Multivariate Gaussian Take [163.14849753700682]
We introduce an approach that performs continuous modeling of per-pixel depth.
Our method's accuracy (named MG) is among the top on the KITTI depth-prediction benchmark leaderboard.
arXiv Detail & Related papers (2023-03-31T16:01:03Z) - A Probabilistic Framework for Visual Localization in Ambiguous Scenes [64.13544430239267]
We propose a probabilistic framework that for a given image predicts the arbitrarily shaped posterior distribution of its camera pose.
We do this via a novel formulation of camera pose regression using variational inference, which allows sampling from the predicted distribution.
Our method outperforms existing methods on localization in ambiguous scenes.
arXiv Detail & Related papers (2023-01-05T14:46:54Z) - RelPose: Predicting Probabilistic Relative Rotation for Single Objects
in the Wild [73.1276968007689]
We describe a data-driven method for inferring the camera viewpoints given multiple images of an arbitrary object.
We show that our approach outperforms state-of-the-art SfM and SLAM methods given sparse images on both seen and unseen categories.
arXiv Detail & Related papers (2022-08-11T17:59:59Z) - A Hybrid Sparse-Dense Monocular SLAM System for Autonomous Driving [0.5735035463793008]
We reconstruct a dense 3D model of the geometry of an outdoor environment using a single monocular camera attached to a moving vehicle.
Our system employs dense depth prediction with a hybrid mapping architecture combining state-of-the-art sparse features and dense fusion-based visual SLAM algorithms.
arXiv Detail & Related papers (2021-08-17T16:13:01Z) - Deep Bingham Networks: Dealing with Uncertainty and Ambiguity in Pose
Estimation [74.76155168705975]
Deep Bingham Networks (DBN) can handle pose-related uncertainties and ambiguities arising in almost all real life applications concerning 3D data.
DBN extends the state of the art direct pose regression networks by (i) a multi-hypotheses prediction head which can yield different distribution modes.
We propose new training strategies so as to avoid mode or posterior collapse during training and to improve numerical stability.
arXiv Detail & Related papers (2020-12-20T19:20:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.