Local Agnostic Video Explanations: a Study on the Applicability of
Removal-Based Explanations to Video
- URL: http://arxiv.org/abs/2401.11796v1
- Date: Mon, 22 Jan 2024 09:53:20 GMT
- Title: Local Agnostic Video Explanations: a Study on the Applicability of
Removal-Based Explanations to Video
- Authors: F. Xavier Gaya-Morey, Jose M. Buades-Rubio, Cristina Manresa-Yee
- Abstract summary: We present a unified framework for local explanations in the video domain.
Our contributions include: (1) Extending a fine-grained explanation framework tailored for computer vision data, (2) Adapting six existing explanation techniques to work on video data, and (3) Conducting an evaluation and comparison of the adapted explanation methods.
- Score: 0.6906005491572401
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Explainable artificial intelligence techniques are becoming increasingly
important with the rise of deep learning applications in various domains. These
techniques aim to provide a better understanding of complex "black box" models
and enhance user trust while maintaining high learning performance. While many
studies have focused on explaining deep learning models in computer vision for
image input, video explanations remain relatively unexplored due to the
temporal dimension's complexity. In this paper, we present a unified framework
for local agnostic explanations in the video domain. Our contributions include:
(1) Extending a fine-grained explanation framework tailored for computer vision
data, (2) Adapting six existing explanation techniques to work on video data by
incorporating temporal information and enabling local explanations, and (3)
Conducting an evaluation and comparison of the adapted explanation methods
using different models and datasets. We discuss the possibilities and choices
involved in the removal-based explanation process for visual data. The
adaptation of six explanation methods for video is explained, with comparisons
to existing approaches. We evaluate the performance of the methods using
automated metrics and user-based evaluation, showing that 3D RISE, 3D LIME, and
3D Kernel SHAP outperform other methods. By decomposing the explanation process
into manageable steps, we facilitate the study of each choice's impact and
allow for further refinement of explanation methods to suit specific datasets
and models.
Related papers
- CNN-based explanation ensembling for dataset, representation and explanations evaluation [1.1060425537315088]
We explore the potential of ensembling explanations generated by deep classification models using convolutional model.
Through experimentation and analysis, we aim to investigate the implications of combining explanations to uncover a more coherent and reliable patterns of the model's behavior.
arXiv Detail & Related papers (2024-04-16T08:39:29Z) - AICL: Action In-Context Learning for Video Diffusion Model [124.39948693332552]
We propose AICL, which empowers the generative model with the ability to understand action information in reference videos.
Extensive experiments demonstrate that AICL effectively captures the action and achieves state-of-the-art generation performance.
arXiv Detail & Related papers (2024-03-18T07:41:19Z) - Explainability for Machine Learning Models: From Data Adaptability to
User Perception [0.8702432681310401]
This thesis explores the generation of local explanations for already deployed machine learning models.
It aims to identify optimal conditions for producing meaningful explanations considering both data and user requirements.
arXiv Detail & Related papers (2024-02-16T18:44:37Z) - A Hierarchical Graph-based Approach for Recognition and Description
Generation of Bimanual Actions in Videos [3.7486111821201287]
This study describes a novel method, integrating graph based modeling with layered hierarchical attention mechanisms.
The complexity of our approach is empirically tested using several 2D and 3D datasets.
arXiv Detail & Related papers (2023-10-01T13:45:48Z) - What and How of Machine Learning Transparency: Building Bespoke
Explainability Tools with Interoperable Algorithmic Components [77.87794937143511]
This paper introduces a collection of hands-on training materials for explaining data-driven predictive models.
These resources cover the three core building blocks of this technique: interpretable representation composition, data sampling and explanation generation.
arXiv Detail & Related papers (2022-09-08T13:33:25Z) - Unified Graph Structured Models for Video Understanding [93.72081456202672]
We propose a message passing graph neural network that explicitly models relational-temporal relations.
We show how our method is able to more effectively model relationships between relevant entities in the scene.
arXiv Detail & Related papers (2021-03-29T14:37:35Z) - A Diagnostic Study of Explainability Techniques for Text Classification [52.879658637466605]
We develop a list of diagnostic properties for evaluating existing explainability techniques.
We compare the saliency scores assigned by the explainability techniques with human annotations of salient input regions to find relations between a model's performance and the agreement of its rationales with human ones.
arXiv Detail & Related papers (2020-09-25T12:01:53Z) - Memory-augmented Dense Predictive Coding for Video Representation
Learning [103.69904379356413]
We propose a new architecture and learning framework Memory-augmented Predictive Coding (MemDPC) for the task.
We investigate visual-only self-supervised video representation learning from RGB frames, or from unsupervised optical flow, or both.
In all cases, we demonstrate state-of-the-art or comparable performance over other approaches with orders of magnitude fewer training data.
arXiv Detail & Related papers (2020-08-03T17:57:01Z) - Explaining Motion Relevance for Activity Recognition in Video Deep
Learning Models [12.807049446839507]
A small subset of explainability techniques has been applied for interpretability of 3D Convolutional Neural Network models in activity recognition tasks.
We propose a selective relevance method for adapting the 2D explanation techniques to provide motion-specific explanations.
Our results show that the selective relevance method can not only provide insight on the role played by motion in the model's decision -- in effect, revealing and quantifying the model's spatial bias -- but the method also simplifies the resulting explanations for human consumption.
arXiv Detail & Related papers (2020-03-31T15:19:04Z) - Object Relational Graph with Teacher-Recommended Learning for Video
Captioning [92.48299156867664]
We propose a complete video captioning system including both a novel model and an effective training strategy.
Specifically, we propose an object relational graph (ORG) based encoder, which captures more detailed interaction features to enrich visual representation.
Meanwhile, we design a teacher-recommended learning (TRL) method to make full use of the successful external language model (ELM) to integrate the abundant linguistic knowledge into the caption model.
arXiv Detail & Related papers (2020-02-26T15:34:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.