VMLoc: Variational Fusion For Learning-Based Multimodal Camera
Localization
- URL: http://arxiv.org/abs/2003.07289v5
- Date: Thu, 22 Jun 2023 11:55:07 GMT
- Title: VMLoc: Variational Fusion For Learning-Based Multimodal Camera
Localization
- Authors: Kaichen Zhou, Changhao Chen, Bing Wang, Muhamad Risqi U. Saputra, Niki
Trigoni, Andrew Markham
- Abstract summary: We propose an end-to-end framework, termed VMLoc, to fuse different sensor inputs into a common latent space.
Unlike previous multimodal variational works directly adapting the objective function of vanilla variational auto-encoder, we show how camera localization can be accurately estimated.
- Score: 46.607930208613574
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent learning-based approaches have achieved impressive results in the
field of single-shot camera localization. However, how best to fuse multiple
modalities (e.g., image and depth) and to deal with degraded or missing input
are less well studied. In particular, we note that previous approaches towards
deep fusion do not perform significantly better than models employing a single
modality. We conjecture that this is because of the naive approaches to feature
space fusion through summation or concatenation which do not take into account
the different strengths of each modality. To address this, we propose an
end-to-end framework, termed VMLoc, to fuse different sensor inputs into a
common latent space through a variational Product-of-Experts (PoE) followed by
attention-based fusion. Unlike previous multimodal variational works directly
adapting the objective function of vanilla variational auto-encoder, we show
how camera localization can be accurately estimated through an unbiased
objective function based on importance weighting. Our model is extensively
evaluated on RGB-D datasets and the results prove the efficacy of our model.
The source code is available at https://github.com/kaichen-z/VMLoc.
Related papers
- Fine-Grained Scene Image Classification with Modality-Agnostic Adapter [8.801601759337006]
We present a new multi-modal feature fusion approach named MAA (Modality-Agnostic Adapter)
We eliminate the modal differences in distribution and then use a modality-agnostic Transformer encoder for a semantic-level feature fusion.
Our experiments demonstrate that MAA achieves state-of-the-art results on benchmarks by applying the same modalities with previous methods.
arXiv Detail & Related papers (2024-07-03T02:57:14Z) - Learning with MISELBO: The Mixture Cookbook [62.75516608080322]
We present the first ever mixture of variational approximations for a normalizing flow-based hierarchical variational autoencoder (VAE) with VampPrior and a PixelCNN decoder network.
We explain this cooperative behavior by drawing a novel connection between VI and adaptive importance sampling.
We obtain state-of-the-art results among VAE architectures in terms of negative log-likelihood on the MNIST and FashionMNIST datasets.
arXiv Detail & Related papers (2022-09-30T15:01:35Z) - FusionVAE: A Deep Hierarchical Variational Autoencoder for RGB Image
Fusion [16.64908104831795]
We present a novel deep hierarchical variational autoencoder called FusionVAE that can serve as a basis for many fusion tasks.
Our approach is able to generate diverse image samples that are conditioned on multiple noisy, occluded, or only partially visible input images.
arXiv Detail & Related papers (2022-09-22T19:06:55Z) - Interactive Multi-scale Fusion of 2D and 3D Features for Multi-object
Tracking [23.130490413184596]
We introduce PointNet++ to obtain multi-scale deep representations of point cloud to make it adaptive to our proposed Interactive Feature Fusion.
Our method can achieve good performance on the KITTI benchmark and outperform other approaches without using multi-scale feature fusion.
arXiv Detail & Related papers (2022-03-30T13:00:27Z) - DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection [83.18142309597984]
Lidars and cameras are critical sensors that provide complementary information for 3D detection in autonomous driving.
We develop a family of generic multi-modal 3D detection models named DeepFusion, which is more accurate than previous methods.
arXiv Detail & Related papers (2022-03-15T18:46:06Z) - Multimodal Object Detection via Bayesian Fusion [59.31437166291557]
We study multimodal object detection with RGB and thermal cameras, since the latter can provide much stronger object signatures under poor illumination.
Our key contribution is a non-learned late-fusion method that fuses together bounding box detections from different modalities.
We apply our approach to benchmarks containing both aligned (KAIST) and unaligned (FLIR) multimodal sensor data.
arXiv Detail & Related papers (2021-04-07T04:03:20Z) - Exploring Data Augmentation for Multi-Modality 3D Object Detection [82.9988604088494]
It is counter-intuitive that multi-modality methods based on point cloud and images perform only marginally better or sometimes worse than approaches that solely use point cloud.
We propose a pipeline, named transformation flow, to bridge the gap between single and multi-modality data augmentation with transformation reversing and replaying.
Our method also wins the best PKL award in the 3rd nuScenes detection challenge.
arXiv Detail & Related papers (2020-12-23T15:23:16Z) - Multi-scale Interactive Network for Salient Object Detection [91.43066633305662]
We propose the aggregate interaction modules to integrate the features from adjacent levels.
To obtain more efficient multi-scale features, the self-interaction modules are embedded in each decoder unit.
Experimental results on five benchmark datasets demonstrate that the proposed method without any post-processing performs favorably against 23 state-of-the-art approaches.
arXiv Detail & Related papers (2020-07-17T15:41:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.