Implicit Saliency in Deep Neural Networks
- URL: http://arxiv.org/abs/2008.01874v1
- Date: Tue, 4 Aug 2020 23:14:24 GMT
- Title: Implicit Saliency in Deep Neural Networks
- Authors: Yutong Sun, Mohit Prabhushankar and Ghassan AlRegib
- Abstract summary: In this paper, we show that existing recognition and localization deep architectures are capable of predicting the human visual saliency.
We calculate this implicit saliency using expectancy-mismatch hypothesis in an unsupervised fashion.
Our experiments show that extracting saliency in this fashion provides comparable performance when measured against the state-of-art supervised algorithms.
- Score: 15.510581400494207
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we show that existing recognition and localization deep
architectures, that have not been exposed to eye tracking data or any saliency
datasets, are capable of predicting the human visual saliency. We term this as
implicit saliency in deep neural networks. We calculate this implicit saliency
using expectancy-mismatch hypothesis in an unsupervised fashion. Our
experiments show that extracting saliency in this fashion provides comparable
performance when measured against the state-of-art supervised algorithms.
Additionally, the robustness outperforms those algorithms when we add large
noise to the input images. Also, we show that semantic features contribute more
than low-level features for human visual saliency detection.
Related papers
- Exploring Geometry of Blind Spots in Vision Models [56.47644447201878]
We study the phenomenon of under-sensitivity in vision models such as CNNs and Transformers.
We propose a Level Set Traversal algorithm that iteratively explores regions of high confidence with respect to the input space.
We estimate the extent of these connected higher-dimensional regions over which the model maintains a high degree of confidence.
arXiv Detail & Related papers (2023-10-30T18:00:33Z) - How deep convolutional neural networks lose spatial information with
training [0.7328100870402177]
We show how stability to image diffeomorphisms is achieved by spatial pooling in the first half of the net, and by channel pooling in the second half.
We find that the increased sensitivity to noise is due to the perturbing noise piling up during pooling, after being rectified by ReLU units.
arXiv Detail & Related papers (2022-10-04T10:21:03Z) - Deep Semantic Statistics Matching (D2SM) Denoising Network [70.01091467628068]
We introduce the Deep Semantic Statistics Matching (D2SM) Denoising Network.
It exploits semantic features of pretrained classification networks, then it implicitly matches the probabilistic distribution of clear images at the semantic feature space.
By learning to preserve the semantic distribution of denoised images, we empirically find our method significantly improves the denoising capabilities of networks.
arXiv Detail & Related papers (2022-07-19T14:35:42Z) - On the Sins of Image Synthesis Loss for Self-supervised Depth Estimation [60.780823530087446]
We show that improvements in image synthesis do not necessitate improvement in depth estimation.
We attribute this diverging phenomenon to aleatoric uncertainties, which originate from data.
This observed divergence has not been previously reported or studied in depth.
arXiv Detail & Related papers (2021-09-13T17:57:24Z) - Understanding Character Recognition using Visual Explanations Derived
from the Human Visual System and Deep Networks [6.734853055176694]
We examine the congruence, or lack thereof, in the information-gathering strategies of deep neural networks.
The deep learning model considered similar regions in character, which humans have fixated in the case of correctly classified characters.
We propose to use the visual fixation maps obtained from the eye-tracking experiment as a supervisory input to align the model's focus on relevant character regions.
arXiv Detail & Related papers (2021-08-10T10:09:37Z) - Deep Feature Tracker: A Novel Application for Deep Convolutional Neural
Networks [0.0]
We propose a novel and unified deep learning-based approach that can learn how to track features reliably.
The proposed network dubbed as Deep-PT consists of a tracker network which is a convolutional neural network cross-correlation.
The network is trained using multiple datasets due to the lack of specialized dataset for feature tracking datasets.
arXiv Detail & Related papers (2021-07-30T23:24:29Z) - Predicting Depth from Semantic Segmentation using Game Engine Dataset [0.0]
This thesis investigates the relation of perception of objects and depth estimation convolutional neural networks.
We developed new network structures based on a simple depth estimation network that only used a single image at its input.
Results show that our novel structures can improve the performance of depth estimation by 52% of relative error of distance.
arXiv Detail & Related papers (2021-06-12T10:15:40Z) - Leveraging Sparse Linear Layers for Debuggable Deep Networks [86.94586860037049]
We show how fitting sparse linear models over learned deep feature representations can lead to more debuggable neural networks.
The resulting sparse explanations can help to identify spurious correlations, explain misclassifications, and diagnose model biases in vision and language tasks.
arXiv Detail & Related papers (2021-05-11T08:15:25Z) - Calibrating Self-supervised Monocular Depth Estimation [77.77696851397539]
In the recent years, many methods demonstrated the ability of neural networks to learn depth and pose changes in a sequence of images, using only self-supervision as the training signal.
We show that incorporating prior information about the camera configuration and the environment, we can remove the scale ambiguity and predict depth directly, still using the self-supervised formulation and not relying on any additional sensors.
arXiv Detail & Related papers (2020-09-16T14:35:45Z) - Interpretation of Deep Temporal Representations by Selective
Visualization of Internally Activated Nodes [24.228613156037532]
We propose two new frameworks to visualize temporal representations learned from deep neural networks.
Our algorithm interprets the decision of temporal neural network by extracting highly activated periods.
We characterize such sub-sequences with clustering and calculate the uncertainty of the suggested type and actual data.
arXiv Detail & Related papers (2020-04-27T01:45:55Z) - DeFeat-Net: General Monocular Depth via Simultaneous Unsupervised
Representation Learning [65.94499390875046]
DeFeat-Net is an approach to simultaneously learn a cross-domain dense feature representation.
Our technique is able to outperform the current state-of-the-art with around 10% reduction in all error measures.
arXiv Detail & Related papers (2020-03-30T13:10:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.