Spatio-Temporal Perturbations for Video Attribution
- URL: http://arxiv.org/abs/2109.00222v1
- Date: Wed, 1 Sep 2021 07:44:16 GMT
- Title: Spatio-Temporal Perturbations for Video Attribution
- Authors: Zhenqiang Li, Weimin Wang, Zuoyue Li, Yifei Huang, Yoichi Sato
- Abstract summary: The attribution method provides a direction for interpreting opaque neural networks in a visual way.
We investigate a generic-based attribution method that is compatible with diversified video understanding networks.
We introduce reliable objective metrics which are checked by a newly proposed reliability measurement.
- Score: 33.19422909074655
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The attribution method provides a direction for interpreting opaque neural
networks in a visual way by identifying and visualizing the input
regions/pixels that dominate the output of a network. Regarding the attribution
method for visually explaining video understanding networks, it is challenging
because of the unique spatiotemporal dependencies existing in video inputs and
the special 3D convolutional or recurrent structures of video understanding
networks. However, most existing attribution methods focus on explaining
networks taking a single image as input and a few works specifically devised
for video attribution come short of dealing with diversified structures of
video understanding networks. In this paper, we investigate a generic
perturbation-based attribution method that is compatible with diversified video
understanding networks. Besides, we propose a novel regularization term to
enhance the method by constraining the smoothness of its attribution results in
both spatial and temporal dimensions. In order to assess the effectiveness of
different video attribution methods without relying on manual judgement, we
introduce reliable objective metrics which are checked by a newly proposed
reliability measurement. We verified the effectiveness of our method by both
subjective and objective evaluation and comparison with multiple significant
attribution methods.
Related papers
- Shap-CAM: Visual Explanations for Convolutional Neural Networks based on
Shapley Value [86.69600830581912]
We develop a novel visual explanation method called Shap-CAM based on class activation mapping.
We demonstrate that Shap-CAM achieves better visual performance and fairness for interpreting the decision making process.
arXiv Detail & Related papers (2022-08-07T00:59:23Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - Efficient Modelling Across Time of Human Actions and Interactions [92.39082696657874]
We argue that current fixed-sized-temporal kernels in 3 convolutional neural networks (CNNDs) can be improved to better deal with temporal variations in the input.
We study how we can better handle between classes of actions, by enhancing their feature differences over different layers of the architecture.
The proposed approaches are evaluated on several benchmark action recognition datasets and show competitive results.
arXiv Detail & Related papers (2021-10-05T15:39:11Z) - Self-Supervised Video Representation Learning by Video Incoherence
Detection [28.540645395066434]
This paper introduces a novel self-supervised method that leverages incoherence detection for video representation learning.
It roots from the observation that visual systems of human beings can easily identify video incoherence based on their comprehensive understanding of videos.
arXiv Detail & Related papers (2021-09-26T04:58:13Z) - Recurrent Neural Networks for video object detection [0.0]
This work compares different methods, especially those which use Recurrent Neural Networks to detect objects in videos.
We differ between feature-based methods, which feed feature maps of different frames into the recurrent units, box-level methods, which feed bounding boxes with class probabilities into the recurrent units and methods which use flow networks.
arXiv Detail & Related papers (2020-10-29T16:40:10Z) - Self-supervised Video Representation Learning by Pace Prediction [48.029602040786685]
This paper addresses the problem of self-supervised video representation learning from a new perspective -- by video pace prediction.
It stems from the observation that human visual system is sensitive to video pace.
We randomly sample training clips in different paces and ask a neural network to identify the pace for each video clip.
arXiv Detail & Related papers (2020-08-13T12:40:24Z) - Unsupervised Learning of Video Representations via Dense Trajectory
Clustering [86.45054867170795]
This paper addresses the task of unsupervised learning of representations for action recognition in videos.
We first propose to adapt two top performing objectives in this class - instance recognition and local aggregation.
We observe promising performance, but qualitative analysis shows that the learned representations fail to capture motion patterns.
arXiv Detail & Related papers (2020-06-28T22:23:03Z) - Towards Visually Explaining Video Understanding Networks with
Perturbation [26.251944509485714]
We investigate a generic perturbation-based method for visually explaining video understanding networks.
We propose a novel loss function to enhance the method by constraining the smoothness of its results in both spatial and temporal dimensions.
arXiv Detail & Related papers (2020-05-01T13:41:38Z) - Assessing the Reliability of Visual Explanations of Deep Models with
Adversarial Perturbations [15.067369314723958]
We propose an objective measure to evaluate the reliability of explanations of deep models.
Our approach is based on changes in the network's outcome resulting from the perturbation of input images in an adversarial way.
We also propose a straightforward application of our approach to clean relevance maps, creating more interpretable maps without any loss in essential explanation.
arXiv Detail & Related papers (2020-04-22T19:57:34Z) - Dynamic Inference: A New Approach Toward Efficient Video Action
Recognition [69.9658249941149]
Action recognition in videos has achieved great success recently, but it remains a challenging task due to the massive computational cost.
We propose a general dynamic inference idea to improve inference efficiency by leveraging the variation in the distinguishability of different videos.
arXiv Detail & Related papers (2020-02-09T11:09:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.