Multimodal Object Detection via Bayesian Fusion
- URL: http://arxiv.org/abs/2104.02904v1
- Date: Wed, 7 Apr 2021 04:03:20 GMT
- Title: Multimodal Object Detection via Bayesian Fusion
- Authors: Yi-Ting Chen, Jinghao Shi, Christoph Mertz, Shu Kong, Deva Ramanan
- Abstract summary: We study multimodal object detection with RGB and thermal cameras, since the latter can provide much stronger object signatures under poor illumination.
Our key contribution is a non-learned late-fusion method that fuses together bounding box detections from different modalities.
We apply our approach to benchmarks containing both aligned (KAIST) and unaligned (FLIR) multimodal sensor data.
- Score: 59.31437166291557
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Object detection with multimodal inputs can improve many safety-critical
perception systems such as autonomous vehicles (AVs). Motivated by AVs that
operate in both day and night, we study multimodal object detection with RGB
and thermal cameras, since the latter can provide much stronger object
signatures under poor illumination. We explore strategies for fusing
information from different modalities. Our key contribution is a non-learned
late-fusion method that fuses together bounding box detections from different
modalities via a simple probabilistic model derived from first principles. Our
simple approach, which we call Bayesian Fusion, is readily derived from
conditional independence assumptions across different modalities. We apply our
approach to benchmarks containing both aligned (KAIST) and unaligned (FLIR)
multimodal sensor data. Our Bayesian Fusion outperforms prior work by more than
13% in relative performance.
Related papers
- MMLF: Multi-modal Multi-class Late Fusion for Object Detection with Uncertainty Estimation [13.624431305114564]
This paper introduces a pioneering Multi-modal Multi-class Late Fusion method, designed for late fusion to enable multi-class detection.
Experiments conducted on the KITTI validation and official test datasets illustrate substantial performance improvements.
Our approach incorporates uncertainty analysis into the classification fusion process, rendering our model more transparent and trustworthy.
arXiv Detail & Related papers (2024-10-11T11:58:35Z) - Diffusion-Based Particle-DETR for BEV Perception [94.88305708174796]
Bird-Eye-View (BEV) is one of the most widely-used scene representations for visual perception in Autonomous Vehicles (AVs)
Recent diffusion-based methods offer a promising approach to uncertainty modeling for visual perception but fail to effectively detect small objects in the large coverage of the BEV.
Here, we address this problem by combining the diffusion paradigm with current state-of-the-art 3D object detectors in BEV.
arXiv Detail & Related papers (2023-12-18T09:52:14Z) - MMDR: A Result Feature Fusion Object Detection Approach for Autonomous
System [5.499393552545591]
The proposed approach, called Multi-Modal Detector based on Result features (MMDR), is designed to work for both 2D and 3D object detection tasks.
The MMDR model incorporates shallow global features during the feature fusion stage, endowing the model with the ability to perceive background information.
arXiv Detail & Related papers (2023-04-19T12:28:42Z) - Multimodal Industrial Anomaly Detection via Hybrid Fusion [59.16333340582885]
We propose a novel multimodal anomaly detection method with hybrid fusion scheme.
Our model outperforms the state-of-the-art (SOTA) methods on both detection and segmentation precision on MVTecD-3 AD dataset.
arXiv Detail & Related papers (2023-03-01T15:48:27Z) - Weakly Aligned Feature Fusion for Multimodal Object Detection [52.15436349488198]
multimodal data often suffer from the position shift problem, i.e., the image pair is not strictly aligned.
This problem makes it difficult to fuse multimodal features and puzzles the convolutional neural network (CNN) training.
In this article, we propose a general multimodal detector named aligned region CNN (AR-CNN) to tackle the position shift problem.
arXiv Detail & Related papers (2022-04-21T02:35:23Z) - Interactive Multi-scale Fusion of 2D and 3D Features for Multi-object
Tracking [23.130490413184596]
We introduce PointNet++ to obtain multi-scale deep representations of point cloud to make it adaptive to our proposed Interactive Feature Fusion.
Our method can achieve good performance on the KITTI benchmark and outperform other approaches without using multi-scale feature fusion.
arXiv Detail & Related papers (2022-03-30T13:00:27Z) - Target-aware Dual Adversarial Learning and a Multi-scenario
Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection [65.30079184700755]
This study addresses the issue of fusing infrared and visible images that appear differently for object detection.
Previous approaches discover commons underlying the two modalities and fuse upon the common space either by iterative optimization or deep networks.
This paper proposes a bilevel optimization formulation for the joint problem of fusion and detection, and then unrolls to a target-aware Dual Adversarial Learning (TarDAL) network for fusion and a commonly used detection network.
arXiv Detail & Related papers (2022-03-30T11:44:56Z) - Cross-Modality Fusion Transformer for Multispectral Object Detection [0.0]
Multispectral image pairs can provide the combined information, making object detection applications more reliable and robust.
We present a simple yet effective cross-modality feature fusion approach, named Cross-Modality Fusion Transformer (CFT) in this paper.
arXiv Detail & Related papers (2021-10-30T15:34:12Z) - VMLoc: Variational Fusion For Learning-Based Multimodal Camera
Localization [46.607930208613574]
We propose an end-to-end framework, termed VMLoc, to fuse different sensor inputs into a common latent space.
Unlike previous multimodal variational works directly adapting the objective function of vanilla variational auto-encoder, we show how camera localization can be accurately estimated.
arXiv Detail & Related papers (2020-03-12T14:52:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.