Uncertainty-Encoded Multi-Modal Fusion for Robust Object Detection in
Autonomous Driving
- URL: http://arxiv.org/abs/2307.16121v1
- Date: Sun, 30 Jul 2023 04:00:41 GMT
- Title: Uncertainty-Encoded Multi-Modal Fusion for Robust Object Detection in
Autonomous Driving
- Authors: Yang Lou, Qun Song, Qian Xu, Rui Tan, Jianping Wang
- Abstract summary: This paper proposes Uncertainty-Encoded Mixture-of-Experts (UMoE) that explicitly incorporates single-modal uncertainties into LiDAR-camera fusion.
UMoE achieves a maximum of 10.67%, 3.17%, and 5.40% performance gain compared with the state-of-the-art proposal-level multi-modal object detectors.
- Score: 8.991012799672713
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-modal fusion has shown initial promising results for object detection
of autonomous driving perception. However, many existing fusion schemes do not
consider the quality of each fusion input and may suffer from adverse
conditions on one or more sensors. While predictive uncertainty has been
applied to characterize single-modal object detection performance at run time,
incorporating uncertainties into the multi-modal fusion still lacks effective
solutions due primarily to the uncertainty's cross-modal incomparability and
distinct sensitivities to various adverse conditions. To fill this gap, this
paper proposes Uncertainty-Encoded Mixture-of-Experts (UMoE) that explicitly
incorporates single-modal uncertainties into LiDAR-camera fusion. UMoE uses
individual expert network to process each sensor's detection result together
with encoded uncertainty. Then, the expert networks' outputs are analyzed by a
gating network to determine the fusion weights. The proposed UMoE module can be
integrated into any proposal fusion pipeline. Evaluation shows that UMoE
achieves a maximum of 10.67%, 3.17%, and 5.40% performance gain compared with
the state-of-the-art proposal-level multi-modal object detectors under extreme
weather, adversarial, and blinding attack scenarios.
Related papers
- E2E-MFD: Towards End-to-End Synchronous Multimodal Fusion Detection [21.185032466325737]
We introduce E2E-MFD, a novel end-to-end algorithm for multimodal fusion detection.
E2E-MFD streamlines the process, achieving high performance with a single training phase.
Our extensive testing on multiple public datasets reveals E2E-MFD's superior capabilities.
arXiv Detail & Related papers (2024-03-14T12:12:17Z) - Provable Dynamic Fusion for Low-Quality Multimodal Data [94.39538027450948]
Dynamic multimodal fusion emerges as a promising learning paradigm.
Despite its widespread use, theoretical justifications in this field are still notably lacking.
This paper provides theoretical understandings to answer this question under a most popular multimodal fusion framework from the generalization perspective.
A novel multimodal fusion framework termed Quality-aware Multimodal Fusion (QMF) is proposed, which can improve the performance in terms of classification accuracy and model robustness.
arXiv Detail & Related papers (2023-06-03T08:32:35Z) - Multimodal Industrial Anomaly Detection via Hybrid Fusion [59.16333340582885]
We propose a novel multimodal anomaly detection method with hybrid fusion scheme.
Our model outperforms the state-of-the-art (SOTA) methods on both detection and segmentation precision on MVTecD-3 AD dataset.
arXiv Detail & Related papers (2023-03-01T15:48:27Z) - Target-aware Dual Adversarial Learning and a Multi-scenario
Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection [65.30079184700755]
This study addresses the issue of fusing infrared and visible images that appear differently for object detection.
Previous approaches discover commons underlying the two modalities and fuse upon the common space either by iterative optimization or deep networks.
This paper proposes a bilevel optimization formulation for the joint problem of fusion and detection, and then unrolls to a target-aware Dual Adversarial Learning (TarDAL) network for fusion and a commonly used detection network.
arXiv Detail & Related papers (2022-03-30T11:44:56Z) - CertainNet: Sampling-free Uncertainty Estimation for Object Detection [65.28989536741658]
Estimating the uncertainty of a neural network plays a fundamental role in safety-critical settings.
In this work, we propose a novel sampling-free uncertainty estimation method for object detection.
We call it CertainNet, and it is the first to provide separate uncertainties for each output signal: objectness, class, location and size.
arXiv Detail & Related papers (2021-10-04T17:59:31Z) - Multimodal Object Detection via Bayesian Fusion [59.31437166291557]
We study multimodal object detection with RGB and thermal cameras, since the latter can provide much stronger object signatures under poor illumination.
Our key contribution is a non-learned late-fusion method that fuses together bounding box detections from different modalities.
We apply our approach to benchmarks containing both aligned (KAIST) and unaligned (FLIR) multimodal sensor data.
arXiv Detail & Related papers (2021-04-07T04:03:20Z) - Learning Selective Sensor Fusion for States Estimation [47.76590539558037]
We propose SelectFusion, an end-to-end selective sensor fusion module.
During prediction, the network is able to assess the reliability of the latent features from different sensor modalities.
We extensively evaluate all fusion strategies in both public datasets and on progressively degraded datasets.
arXiv Detail & Related papers (2019-12-30T20:25:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.