Local Agnostic Video Explanations: a Study on the Applicability of
Removal-Based Explanations to Video
- URL: http://arxiv.org/abs/2401.11796v1
- Date: Mon, 22 Jan 2024 09:53:20 GMT
- Title: Local Agnostic Video Explanations: a Study on the Applicability of
Removal-Based Explanations to Video
- Authors: F. Xavier Gaya-Morey, Jose M. Buades-Rubio, Cristina Manresa-Yee
- Abstract summary: We present a unified framework for local explanations in the video domain.
Our contributions include: (1) Extending a fine-grained explanation framework tailored for computer vision data, (2) Adapting six existing explanation techniques to work on video data, and (3) Conducting an evaluation and comparison of the adapted explanation methods.
- Score: 0.6906005491572401
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Explainable artificial intelligence techniques are becoming increasingly
important with the rise of deep learning applications in various domains. These
techniques aim to provide a better understanding of complex "black box" models
and enhance user trust while maintaining high learning performance. While many
studies have focused on explaining deep learning models in computer vision for
image input, video explanations remain relatively unexplored due to the
temporal dimension's complexity. In this paper, we present a unified framework
for local agnostic explanations in the video domain. Our contributions include:
(1) Extending a fine-grained explanation framework tailored for computer vision
data, (2) Adapting six existing explanation techniques to work on video data by
incorporating temporal information and enabling local explanations, and (3)
Conducting an evaluation and comparison of the adapted explanation methods
using different models and datasets. We discuss the possibilities and choices
involved in the removal-based explanation process for visual data. The
adaptation of six explanation methods for video is explained, with comparisons
to existing approaches. We evaluate the performance of the methods using
automated metrics and user-based evaluation, showing that 3D RISE, 3D LIME, and
3D Kernel SHAP outperform other methods. By decomposing the explanation process
into manageable steps, we facilitate the study of each choice's impact and
allow for further refinement of explanation methods to suit specific datasets
and models.
Related papers
- Navigating the Maze of Explainable AI: A Systematic Approach to Evaluating Methods and Metrics [10.045644410833402]
We introduce LATEC, a large-scale benchmark that critically evaluates 17 prominent XAI methods using 20 distinct metrics.
We showcase the high risk of conflicting metrics leading to unreliable rankings and consequently propose a more robust evaluation scheme.
LATEC reinforces its role in future XAI research by publicly releasing all 326k saliency maps and 378k metric scores as a (meta-evaluation) dataset.
arXiv Detail & Related papers (2024-09-25T09:07:46Z) - Building an Open-Vocabulary Video CLIP Model with Better Architectures,
Optimization and Data [102.0069667710562]
This paper presents Open-VCLIP++, a framework that adapts CLIP to a strong zero-shot video classifier.
We demonstrate that training Open-VCLIP++ is tantamount to continual learning with zero historical data.
Our approach is evaluated on three widely used action recognition datasets.
arXiv Detail & Related papers (2023-10-08T04:46:43Z) - Precise Benchmarking of Explainable AI Attribution Methods [0.0]
We propose a novel evaluation approach for benchmarking state-of-the-art XAI attribution methods.
Our proposal consists of a synthetic classification model accompanied by its derived ground truth explanations.
Our experimental results provide novel insights into the performance of Guided-Backprop and Smoothgrad XAI methods.
arXiv Detail & Related papers (2023-08-06T17:03:32Z) - Sample Less, Learn More: Efficient Action Recognition via Frame Feature
Restoration [59.6021678234829]
We propose a novel method to restore the intermediate features for two sparsely sampled and adjacent video frames.
With the integration of our method, the efficiency of three commonly used baselines has been improved by over 50%, with a mere 0.5% reduction in recognition accuracy.
arXiv Detail & Related papers (2023-07-27T13:52:42Z) - An Experimental Investigation into the Evaluation of Explainability
Methods [60.54170260771932]
This work compares 14 different metrics when applied to nine state-of-the-art XAI methods and three dummy methods (e.g., random saliency maps) used as references.
Experimental results show which of these metrics produces highly correlated results, indicating potential redundancy.
arXiv Detail & Related papers (2023-05-25T08:07:07Z) - MIC: Masked Image Consistency for Context-Enhanced Domain Adaptation [104.40114562948428]
In unsupervised domain adaptation (UDA), a model trained on source data (e.g. synthetic) is adapted to target data (e.g. real-world) without access to target annotation.
We propose a Masked Image Consistency (MIC) module to enhance UDA by learning spatial context relations of the target domain.
MIC significantly improves the state-of-the-art performance across the different recognition tasks for synthetic-to-real, day-to-nighttime, and clear-to-adverse-weather UDA.
arXiv Detail & Related papers (2022-12-02T17:29:32Z) - Efficient Deep Visual and Inertial Odometry with Adaptive Visual
Modality Selection [12.754974372231647]
We propose an adaptive deep-learning based VIO method that reduces computational redundancy by opportunistically disabling the visual modality.
A Gumbel-Softmax trick is adopted to train the policy network to make the decision process differentiable for end-to-end system training.
Experiment results show that our method achieves a similar or even better performance than the full-modality baseline.
arXiv Detail & Related papers (2022-05-12T16:17:49Z) - End-to-end video instance segmentation via spatial-temporal graph neural
networks [30.748756362692184]
Video instance segmentation is a challenging task that extends image instance segmentation to the video domain.
Existing methods either rely only on single-frame information for the detection and segmentation subproblems or handle tracking as a separate post-processing step.
We propose a novel graph-neural-network (GNN) based method to handle the aforementioned limitation.
arXiv Detail & Related papers (2022-03-07T05:38:08Z) - Efficient Video Segmentation Models with Per-frame Inference [117.97423110566963]
We focus on improving the temporal consistency without introducing overhead in inference.
We propose several techniques to learn from the video sequence, including a temporal consistency loss and online/offline knowledge distillation methods.
arXiv Detail & Related papers (2022-02-24T23:51:36Z) - The DEVIL is in the Details: A Diagnostic Evaluation Benchmark for Video
Inpainting [43.90848669491335]
We propose the Diagnostic Evaluation of Video Inpainting on Landscapes (DEVIL) benchmark, which consists of two contributions.
Our challenging benchmark enables more insightful analysis into video inpainting methods and serves as an invaluable diagnostic tool for the field.
arXiv Detail & Related papers (2021-05-11T20:13:53Z) - Composable Augmentation Encoding for Video Representation Learning [94.2358972764708]
We focus on contrastive methods for self-supervised video representation learning.
A common paradigm in contrastive learning is to construct positive pairs by sampling different data views for the same instance, with different data instances as negatives.
We propose an 'augmentation aware' contrastive learning framework, where we explicitly provide a sequence of augmentation parameterisations.
We show that our method encodes valuable information about specified spatial or temporal augmentation, and in doing so also achieve state-of-the-art performance on a number of video benchmarks.
arXiv Detail & Related papers (2021-04-01T16:48:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.