Towards Visually Explaining Video Understanding Networks with
Perturbation
- URL: http://arxiv.org/abs/2005.00375v2
- Date: Mon, 9 Nov 2020 15:30:07 GMT
- Title: Towards Visually Explaining Video Understanding Networks with
Perturbation
- Authors: Zhenqiang Li, Weimin Wang, Zuoyue Li, Yifei Huang, Yoichi Sato
- Abstract summary: We investigate a generic perturbation-based method for visually explaining video understanding networks.
We propose a novel loss function to enhance the method by constraining the smoothness of its results in both spatial and temporal dimensions.
- Score: 26.251944509485714
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: ''Making black box models explainable'' is a vital problem that accompanies
the development of deep learning networks. For networks taking visual
information as input, one basic but challenging explanation method is to
identify and visualize the input pixels/regions that dominate the network's
prediction. However, most existing works focus on explaining networks taking a
single image as input and do not consider the temporal relationship that exists
in videos. Providing an easy-to-use visual explanation method that is
applicable to diversified structures of video understanding networks still
remains an open challenge. In this paper, we investigate a generic
perturbation-based method for visually explaining video understanding networks.
Besides, we propose a novel loss function to enhance the method by constraining
the smoothness of its results in both spatial and temporal dimensions. The
method enables the comparison of explanation results between different network
structures to become possible and can also avoid generating the pathological
adversarial explanations for video inputs. Experimental comparison results
verified the effectiveness of our method.
Related papers
- Don't trust your eyes: on the (un)reliability of feature visualizations [25.018840023636546]
We show how to trick feature visualizations into showing arbitrary patterns that are completely disconnected from normal network behavior on natural input.
We then provide evidence for a similar phenomenon occurring in standard, unmanipulated networks.
This can be used as a sanity check for feature visualizations.
arXiv Detail & Related papers (2023-06-07T18:31:39Z) - Shap-CAM: Visual Explanations for Convolutional Neural Networks based on
Shapley Value [86.69600830581912]
We develop a novel visual explanation method called Shap-CAM based on class activation mapping.
We demonstrate that Shap-CAM achieves better visual performance and fairness for interpreting the decision making process.
arXiv Detail & Related papers (2022-08-07T00:59:23Z) - Learning with Capsules: A Survey [73.31150426300198]
Capsule networks were proposed as an alternative approach to Convolutional Neural Networks (CNNs) for learning object-centric representations.
Unlike CNNs, capsule networks are designed to explicitly model part-whole hierarchical relationships.
arXiv Detail & Related papers (2022-06-06T15:05:36Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - Self-Supervised Video Representation Learning by Video Incoherence
Detection [28.540645395066434]
This paper introduces a novel self-supervised method that leverages incoherence detection for video representation learning.
It roots from the observation that visual systems of human beings can easily identify video incoherence based on their comprehensive understanding of videos.
arXiv Detail & Related papers (2021-09-26T04:58:13Z) - Spatio-Temporal Perturbations for Video Attribution [33.19422909074655]
The attribution method provides a direction for interpreting opaque neural networks in a visual way.
We investigate a generic-based attribution method that is compatible with diversified video understanding networks.
We introduce reliable objective metrics which are checked by a newly proposed reliability measurement.
arXiv Detail & Related papers (2021-09-01T07:44:16Z) - On the Post-hoc Explainability of Deep Echo State Networks for Time
Series Forecasting, Image and Video Classification [63.716247731036745]
echo state networks have attracted many stares through time, mainly due to the simplicity and computational efficiency of their learning algorithm.
This work addresses this issue by conducting an explainability study of Echo State Networks when applied to learning tasks with time series, image and video data.
Specifically, the study proposes three different techniques capable of eliciting understandable information about the knowledge grasped by these recurrent models.
arXiv Detail & Related papers (2021-02-17T08:56:33Z) - Self-supervised Video Representation Learning by Pace Prediction [48.029602040786685]
This paper addresses the problem of self-supervised video representation learning from a new perspective -- by video pace prediction.
It stems from the observation that human visual system is sensitive to video pace.
We randomly sample training clips in different paces and ask a neural network to identify the pace for each video clip.
arXiv Detail & Related papers (2020-08-13T12:40:24Z) - Dynamic Inference: A New Approach Toward Efficient Video Action
Recognition [69.9658249941149]
Action recognition in videos has achieved great success recently, but it remains a challenging task due to the massive computational cost.
We propose a general dynamic inference idea to improve inference efficiency by leveraging the variation in the distinguishability of different videos.
arXiv Detail & Related papers (2020-02-09T11:09:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.