GEM: Glare or Gloom, I Can Still See You -- End-to-End Multimodal Object
Detector
- URL: http://arxiv.org/abs/2102.12319v1
- Date: Wed, 24 Feb 2021 14:56:37 GMT
- Title: GEM: Glare or Gloom, I Can Still See You -- End-to-End Multimodal Object
Detector
- Authors: Osama Mazhar, Jens Kober and Robert Babuska
- Abstract summary: We propose sensor-aware multi-modal fusion strategies for 2D object detection in harsh-lighting conditions.
Our network learns to estimate the measurement reliability of each sensor modality in the form of scalar weights and masks.
We show that the proposed strategies out-perform the existing state-of-the-art methods on the FLIR-Thermal dataset.
- Score: 11.161639542268015
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep neural networks designed for vision tasks are often prone to failure
when they encounter environmental conditions not covered by the training data.
Efficient fusion strategies for multi-sensor configurations can enhance the
robustness of the detection algorithms by exploiting redundancy from different
sensor streams. In this paper, we propose sensor-aware multi-modal fusion
strategies for 2D object detection in harsh-lighting conditions. Our network
learns to estimate the measurement reliability of each sensor modality in the
form of scalar weights and masks, without prior knowledge of the sensor
characteristics. The obtained weights are assigned to the extracted feature
maps which are subsequently fused and passed to the transformer encoder-decoder
network for object detection. This is critical in the case of asymmetric sensor
failures and to prevent any tragic consequences. Through extensive
experimentation, we show that the proposed strategies out-perform the existing
state-of-the-art methods on the FLIR-Thermal dataset, improving the mAP up-to
25.2%. We also propose a new "r-blended" hybrid depth modality for RGB-D
multi-modal detection tasks. Our proposed method also obtained promising
results on the SUNRGB-D dataset.
Related papers
- Efficient Meta-Learning Enabled Lightweight Multiscale Few-Shot Object Detection in Remote Sensing Images [15.12889076965307]
YOLOv7 one-stage detector is subjected to a novel meta-learning training framework.
This transformation allows the detector to adeptly address FSOD tasks while capitalizing on its inherent advantage of lightweight.
To validate the effectiveness of our proposed detector, we conducted performance comparisons with current state-of-the-art detectors.
arXiv Detail & Related papers (2024-04-29T04:56:52Z) - Efficient Multi-Resolution Fusion for Remote Sensing Data with Label
Uncertainty [0.7832189413179361]
This paper presents a new method for fusing multi-modal and multi-resolution remote sensor data without requiring pixel-level training labels.
We propose a new method based on binary fuzzy measures, which reduces the search space and significantly improves the efficiency of the MIMRF framework.
arXiv Detail & Related papers (2024-02-07T17:34:32Z) - Target-aware Dual Adversarial Learning and a Multi-scenario
Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection [65.30079184700755]
This study addresses the issue of fusing infrared and visible images that appear differently for object detection.
Previous approaches discover commons underlying the two modalities and fuse upon the common space either by iterative optimization or deep networks.
This paper proposes a bilevel optimization formulation for the joint problem of fusion and detection, and then unrolls to a target-aware Dual Adversarial Learning (TarDAL) network for fusion and a commonly used detection network.
arXiv Detail & Related papers (2022-03-30T11:44:56Z) - Joint Learning of Salient Object Detection, Depth Estimation and Contour
Extraction [91.43066633305662]
We propose a novel multi-task and multi-modal filtered transformer (MMFT) network for RGB-D salient object detection (SOD)
Specifically, we unify three complementary tasks: depth estimation, salient object detection and contour estimation. The multi-task mechanism promotes the model to learn the task-aware features from the auxiliary tasks.
Experiments show that it not only significantly surpasses the depth-based RGB-D SOD methods on multiple datasets, but also precisely predicts a high-quality depth map and salient contour at the same time.
arXiv Detail & Related papers (2022-03-09T17:20:18Z) - A Comprehensive Evaluation on Multi-channel Biometric Face Presentation
Attack Detection [6.488575826304023]
presentation attack detection (PAD) systems try to address this problem.
Lack of generalization and robustness continues to be a major concern.
We use a multi-channel convolutional network-based architecture, which uses pixel-wise binary supervision.
arXiv Detail & Related papers (2022-02-21T15:04:39Z) - EPNet++: Cascade Bi-directional Fusion for Multi-Modal 3D Object
Detection [56.03081616213012]
We propose EPNet++ for multi-modal 3D object detection by introducing a novel Cascade Bi-directional Fusion(CB-Fusion) module.
The proposed CB-Fusion module boosts the plentiful semantic information of point features with the image features in a cascade bi-directional interaction fusion manner.
The experiment results on the KITTI, JRDB and SUN-RGBD datasets demonstrate the superiority of EPNet++ over the state-of-the-art methods.
arXiv Detail & Related papers (2021-12-21T10:48:34Z) - EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation [62.210091681352914]
We study multi-sensor fusion for 3D semantic segmentation for many applications, such as autonomous driving and robotics.
In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF)
We propose a two-stream network to extract features from the two modalities separately. The extracted features are fused by effective residual-based fusion modules.
arXiv Detail & Related papers (2021-06-21T10:47:26Z) - M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection [74.19291916812921]
forged images generated by Deepfake techniques pose a serious threat to the trustworthiness of digital information.
In this paper, we aim to capture the subtle manipulation artifacts at different scales for Deepfake detection.
We introduce a high-quality Deepfake dataset, SR-DF, which consists of 4,000 DeepFake videos generated by state-of-the-art face swapping and facial reenactment methods.
arXiv Detail & Related papers (2021-04-20T05:43:44Z) - Uncertainty-Aware Deep Calibrated Salient Object Detection [74.58153220370527]
Existing deep neural network based salient object detection (SOD) methods mainly focus on pursuing high network accuracy.
These methods overlook the gap between network accuracy and prediction confidence, known as the confidence uncalibration problem.
We introduce an uncertaintyaware deep SOD network, and propose two strategies to prevent deep SOD networks from being overconfident.
arXiv Detail & Related papers (2020-12-10T23:28:36Z) - MSDPN: Monocular Depth Prediction with Partial Laser Observation using
Multi-stage Neural Networks [1.1602089225841632]
We propose a deep-learning-based multi-stage network architecture called Multi-Stage Depth Prediction Network (MSDPN)
MSDPN is proposed to predict a dense depth map using a 2D LiDAR and a monocular camera.
As verified experimentally, our network yields promising performance against state-of-the-art methods.
arXiv Detail & Related papers (2020-08-04T08:27:40Z) - Learning Selective Sensor Fusion for States Estimation [47.76590539558037]
We propose SelectFusion, an end-to-end selective sensor fusion module.
During prediction, the network is able to assess the reliability of the latent features from different sensor modalities.
We extensively evaluate all fusion strategies in both public datasets and on progressively degraded datasets.
arXiv Detail & Related papers (2019-12-30T20:25:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.