A Comparison of Deep Saliency Map Generators on Multispectral Data in
Object Detection
- URL: http://arxiv.org/abs/2108.11767v1
- Date: Thu, 26 Aug 2021 12:56:49 GMT
- Title: A Comparison of Deep Saliency Map Generators on Multispectral Data in
Object Detection
- Authors: Jens Bayer, David M\"unch, Michael Arens
- Abstract summary: This work investigates three saliency map generator methods on how their maps differ across the different spectra.
As a practical problem, we chose object detection in the infrared and visual spectrum for autonomous driving.
The results show that there are differences between the infrared and visual activation maps.
Further, an advanced training with both, the infrared and visual data not only improves the network's output, it also leads to more focused spots in the saliency maps.
- Score: 9.264502124445348
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks, especially convolutional deep neural networks, are
state-of-the-art methods to classify, segment or even generate images, movies,
or sounds. However, these methods lack of a good semantic understanding of what
happens internally. The question, why a COVID-19 detector has classified a
stack of lung-ct images as positive, is sometimes more interesting than the
overall specificity and sensitivity. Especially when human domain expert
knowledge disagrees with the given output. This way, human domain experts could
also be advised to reconsider their choice, regarding the information pointed
out by the system. In addition, the deep learning model can be controlled, and
a present dataset bias can be found. Currently, most explainable AI methods in
the computer vision domain are purely used on image classification, where the
images are ordinary images in the visible spectrum. As a result, there is no
comparison on how the methods behave with multimodal image data, as well as
most methods have not been investigated on how they behave when used for object
detection. This work tries to close the gaps. Firstly, investigating three
saliency map generator methods on how their maps differ across the different
spectra. This is achieved via accurate and systematic training. Secondly, we
examine how they behave when used for object detection. As a practical problem,
we chose object detection in the infrared and visual spectrum for autonomous
driving. The dataset used in this work is the Multispectral Object Detection
Dataset, where each scene is available in the FIR, MIR and NIR as well as
visual spectrum. The results show that there are differences between the
infrared and visual activation maps. Further, an advanced training with both,
the infrared and visual data not only improves the network's output, it also
leads to more focused spots in the saliency maps.
Related papers
- Deep Homography Estimation for Visual Place Recognition [49.235432979736395]
We propose a transformer-based deep homography estimation (DHE) network.
It takes the dense feature map extracted by a backbone network as input and fits homography for fast and learnable geometric verification.
Experiments on benchmark datasets show that our method can outperform several state-of-the-art methods.
arXiv Detail & Related papers (2024-02-25T13:22:17Z) - Deep Learning Computer Vision Algorithms for Real-time UAVs On-board
Camera Image Processing [77.34726150561087]
This paper describes how advanced deep learning based computer vision algorithms are applied to enable real-time on-board sensor processing for small UAVs.
All algorithms have been developed using state-of-the-art image processing methods based on deep neural networks.
arXiv Detail & Related papers (2022-11-02T11:10:42Z) - Semantic Segmentation for Thermal Images: A Comparative Survey [0.0]
Using infrared spectrum in semantic segmentation has many real-world use cases, such as autonomous driving, medical imaging, agriculture, defense industry, etc.
One approach is to use both visible and infrared spectrum images as inputs.
Another approach is to use only thermal images, enabling less hardware cost for smaller use cases.
arXiv Detail & Related papers (2022-05-26T11:32:15Z) - Target-aware Dual Adversarial Learning and a Multi-scenario
Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection [65.30079184700755]
This study addresses the issue of fusing infrared and visible images that appear differently for object detection.
Previous approaches discover commons underlying the two modalities and fuse upon the common space either by iterative optimization or deep networks.
This paper proposes a bilevel optimization formulation for the joint problem of fusion and detection, and then unrolls to a target-aware Dual Adversarial Learning (TarDAL) network for fusion and a commonly used detection network.
arXiv Detail & Related papers (2022-03-30T11:44:56Z) - Multispectral Satellite Data Classification using Soft Computing
Approach [5.3971200250581814]
We propose a grid-density based clustering technique for identification of objects.
We introduce an approach to classify a satellite image data using a rule induction based machine learning algorithm.
arXiv Detail & Related papers (2022-03-21T17:25:09Z) - Learning Hierarchical Graph Representation for Image Manipulation
Detection [50.04902159383709]
The objective of image manipulation detection is to identify and locate the manipulated regions in the images.
Recent approaches mostly adopt the sophisticated Convolutional Neural Networks (CNNs) to capture the tampering artifacts left in the images.
We propose a hierarchical Graph Convolutional Network (HGCN-Net), which consists of two parallel branches.
arXiv Detail & Related papers (2022-01-15T01:54:25Z) - Understanding Character Recognition using Visual Explanations Derived
from the Human Visual System and Deep Networks [6.734853055176694]
We examine the congruence, or lack thereof, in the information-gathering strategies of deep neural networks.
The deep learning model considered similar regions in character, which humans have fixated in the case of correctly classified characters.
We propose to use the visual fixation maps obtained from the eye-tracking experiment as a supervisory input to align the model's focus on relevant character regions.
arXiv Detail & Related papers (2021-08-10T10:09:37Z) - Computational efficient deep neural network with difference attention
maps for facial action unit detection [3.73202122588308]
We propose a computational efficient end-to-end training deep neural network (CEDNN) model and spatial attention maps based on difference images.
A large number of experimental results show that the proposed CEDNN is obviously better than the traditional deep learning method on DISFA+ and CK+ datasets.
arXiv Detail & Related papers (2020-11-24T13:34:58Z) - Towards Dense People Detection with Deep Learning and Depth images [9.376814409561726]
This paper proposes a DNN-based system that detects multiple people from a single depth image.
Our neural network processes a depth image and outputs a likelihood map in image coordinates.
We show this strategy to be effective, producing networks that generalize to work with scenes different from those used during training.
arXiv Detail & Related papers (2020-07-14T16:43:02Z) - Ventral-Dorsal Neural Networks: Object Detection via Selective Attention [51.79577908317031]
We propose a new framework called Ventral-Dorsal Networks (VDNets)
Inspired by the structure of the human visual system, we propose the integration of a "Ventral Network" and a "Dorsal Network"
Our experimental results reveal that the proposed method outperforms state-of-the-art object detection approaches.
arXiv Detail & Related papers (2020-05-15T23:57:36Z) - Single Image Depth Estimation Trained via Depth from Defocus Cues [105.67073923825842]
Estimating depth from a single RGB image is a fundamental task in computer vision.
In this work, we rely, instead of different views, on depth from focus cues.
We present results that are on par with supervised methods on KITTI and Make3D datasets and outperform unsupervised learning approaches.
arXiv Detail & Related papers (2020-01-14T20:22:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.