Related papers: Deriving Explanation of Deep Visual Saliency Models

Deriving Explanation of Deep Visual Saliency Models

URL: http://arxiv.org/abs/2109.03575v1
Date: Wed, 8 Sep 2021 12:22:32 GMT
Title: Deriving Explanation of Deep Visual Saliency Models
Authors: Sai Phani Kumar Malladi, Jayanta Mukhopadhyay, Chaker Larabi, Santanu Chaudhury
Abstract summary: We develop a technique to derive explainable saliency models from their corresponding deep neural architecture based saliency models. We consider two state-of-the-art deep saliency models, namely UNISAL and MSI-Net for our interpretation. We also build our own deep saliency model named cross-concatenated multi-scale residual block based network (CMRNet) for saliency prediction.
Score: 6.808418311272862
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Deep neural networks have shown their profound impact on achieving human level performance in visual saliency prediction. However, it is still unclear how they learn the task and what it means in terms of understanding human visual system. In this work, we develop a technique to derive explainable saliency models from their corresponding deep neural architecture based saliency models by applying human perception theories and the conventional concepts of saliency. This technique helps us understand the learning pattern of the deep network at its intermediate layers through their activation maps. Initially, we consider two state-of-the-art deep saliency models, namely UNISAL and MSI-Net for our interpretation. We use a set of biologically plausible log-gabor filters for identifying and reconstructing the activation maps of them using our explainable saliency model. The final saliency map is generated using these reconstructed activation maps. We also build our own deep saliency model named cross-concatenated multi-scale residual block based network (CMRNet) for saliency prediction. Then, we evaluate and compare the performance of the explainable models derived from UNISAL, MSI-Net and CMRNet on three benchmark datasets with other state-of-the-art methods. Hence, we propose that this approach of explainability can be applied to any deep visual saliency model for interpretation which makes it a generic one.

Related papers

Show and Tell: Visually Explainable Deep Neural Nets via Spatially-Aware Concept Bottleneck Models [5.985204759362746]
We present a unified framework for transforming any vision neural network into a spatially and conceptually interpretable model. We name this method "Spatially-Aware and Label-Free Concept Bottleneck Model" (SALF-CBM)
arXiv Detail & Related papers (2025-02-27T14:27:55Z)
Generating visual explanations from deep networks using implicit neural representations [0.6056822594090163]
In this work, we demonstrate that implicit neural representations (INRs) constitute a good framework for generating visual explanations. We present an iterative INR-based method that can be used to generate multiple non-overlapping attribution masks for the same image.
arXiv Detail & Related papers (2025-01-20T23:17:57Z)
Language Model as Visual Explainer [72.88137795439407]
We present a systematic approach for interpreting vision models using a tree-structured linguistic explanation. Our method provides human-understandable explanations in the form of attribute-laden trees. To access the effectiveness of our approach, we introduce new benchmarks and conduct rigorous evaluations.
arXiv Detail & Related papers (2024-12-08T20:46:23Z)
Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond [61.18736646013446]
In pursuit of a deeper understanding of its surprising behaviors, we investigate the utility of a simple yet accurate model of a trained neural network. Across three case studies, we illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena.
arXiv Detail & Related papers (2024-10-31T22:54:34Z)
Automatic Discovery of Visual Circuits [66.99553804855931]
We explore scalable methods for extracting the subgraph of a vision model's computational graph that underlies recognition of a specific visual concept. We find that our approach extracts circuits that causally affect model output, and that editing these circuits can defend large pretrained models from adversarial attacks.
arXiv Detail & Related papers (2024-04-22T17:00:57Z)
Manipulating Feature Visualizations with Gradient Slingshots [54.31109240020007]
We introduce a novel method for manipulating Feature Visualization (FV) without significantly impacting the model's decision-making process. We evaluate the effectiveness of our method on several neural network models and demonstrate its capabilities to hide the functionality of arbitrarily chosen neurons.
arXiv Detail & Related papers (2024-01-11T18:57:17Z)
Hierarchical Semantic Tree Concept Whitening for Interpretable Image Classification [19.306487616731765]
Post-hoc analysis can only discover the patterns or rules that naturally exist in models. We proactively instill knowledge to alter the representation of human-understandable concepts in hidden layers. Our method improves model interpretability, showing better disentanglement of semantic concepts, without negatively affecting model classification performance.
arXiv Detail & Related papers (2023-07-10T04:54:05Z)
Experimental Observations of the Topology of Convolutional Neural Network Activations [2.4235626091331737]
Topological data analysis provides compact, noise-robust representations of complex structures. Deep neural networks (DNNs) learn millions of parameters associated with a series of transformations defined by the model architecture. In this paper, we apply cutting edge techniques from TDA with the goal of gaining insight into the interpretability of convolutional neural networks used for image classification.
arXiv Detail & Related papers (2022-12-01T02:05:44Z)
Adaptive Convolutional Dictionary Network for CT Metal Artifact Reduction [62.691996239590125]
We propose an adaptive convolutional dictionary network (ACDNet) for metal artifact reduction. Our ACDNet can automatically learn the prior for artifact-free CT images via training data and adaptively adjust the representation kernels for each input CT image. Our method inherits the clear interpretability of model-based methods and maintains the powerful representation ability of learning-based methods.
arXiv Detail & Related papers (2022-05-16T06:49:36Z)
Perception Visualization: Seeing Through the Eyes of a DNN [5.9557391359320375]
We develop a new form of explanation that is radically different in nature from current explanation methods, such as Grad-CAM. Perception visualization provides a visual representation of what the DNN perceives in the input image by depicting what visual patterns the latent representation corresponds to. Results of our user study demonstrate that humans can better understand and predict the system's decisions when perception visualizations are available.
arXiv Detail & Related papers (2022-04-21T07:18:55Z)
Multi-Semantic Image Recognition Model and Evaluating Index for explaining the deep learning models [31.387124252490377]
We first propose a multi-semantic image recognition model, which enables human beings to understand the decision-making process of the neural network. We then presents a new evaluation index, which can quantitatively assess the model interpretability. This paper also exhibits the relevant baseline performance with current state-of-the-art deep learning models.
arXiv Detail & Related papers (2021-09-28T07:18:05Z)
Multi-Branch Deep Radial Basis Function Networks for Facial Emotion Recognition [80.35852245488043]
We propose a CNN based architecture enhanced with multiple branches formed by radial basis function (RBF) units. RBF units capture local patterns shared by similar instances using an intermediate representation. We show it is the incorporation of local information what makes the proposed model competitive.
arXiv Detail & Related papers (2021-09-07T21:05:56Z)
SOSD-Net: Joint Semantic Object Segmentation and Depth Estimation from Monocular images [94.36401543589523]
We introduce the concept of semantic objectness to exploit the geometric relationship of these two tasks. We then propose a Semantic Object and Depth Estimation Network (SOSD-Net) based on the objectness assumption. To the best of our knowledge, SOSD-Net is the first network that exploits the geometry constraint for simultaneous monocular depth estimation and semantic segmentation.
arXiv Detail & Related papers (2021-01-19T02:41:03Z)
Explaining Convolutional Neural Networks through Attribution-Based Input Sampling and Block-Wise Feature Aggregation [22.688772441351308]
Methods based on class activation mapping and randomized input sampling have gained great popularity. However, the attribution methods provide lower resolution and blurry explanation maps that limit their explanation power. In this work, we collect visualization maps from multiple layers of the model based on an attribution-based input sampling technique. We also propose a layer selection strategy that applies to the whole family of CNN-based models.
arXiv Detail & Related papers (2020-10-01T20:27:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.