How to Squeeze An Explanation Out of Your Model
- URL: http://arxiv.org/abs/2412.05134v1
- Date: Fri, 06 Dec 2024 15:47:53 GMT
- Title: How to Squeeze An Explanation Out of Your Model
- Authors: Tiago Roxo, Joana C. Costa, Pedro R. M. Inácio, Hugo Proença,
- Abstract summary: This paper proposes an approach for interpretability that is model-agnostic.
By including an SE block prior to the classification layer of any model, we are able to retrieve the most influential features.
Results show that this new SE-based interpretability can be applied to various models in image and video/multi-modal settings.
- Score: 13.154512864498912
- License:
- Abstract: Deep learning models are widely used nowadays for their reliability in performing various tasks. However, they do not typically provide the reasoning behind their decision, which is a significant drawback, particularly for more sensitive areas such as biometrics, security and healthcare. The most commonly used approaches to provide interpretability create visual attention heatmaps of regions of interest on an image based on models gradient backpropagation. Although this is a viable approach, current methods are targeted toward image settings and default/standard deep learning models, meaning that they require significant adaptations to work on video/multi-modal settings and custom architectures. This paper proposes an approach for interpretability that is model-agnostic, based on a novel use of the Squeeze and Excitation (SE) block that creates visual attention heatmaps. By including an SE block prior to the classification layer of any model, we are able to retrieve the most influential features via SE vector manipulation, one of the key components of the SE block. Our results show that this new SE-based interpretability can be applied to various models in image and video/multi-modal settings, namely biometrics of facial features with CelebA and behavioral biometrics using Active Speaker Detection datasets. Furthermore, our proposal does not compromise model performance toward the original task, and has competitive results with current interpretability approaches in state-of-the-art object datasets, highlighting its robustness to perform in varying data aside from the biometric context.
Related papers
- Free Lunch in Pathology Foundation Model: Task-specific Model Adaptation with Concept-Guided Feature Enhancement [18.839406725114042]
We present Concept Anchor-guided Task-specific Feature Enhancement (CATE)
CATE can boost the expressivity and discriminativeness of pathology foundation models for specific downstream tasks.
Experiments on public WSI datasets demonstrate that CATE significantly enhances the performance and generalizability of MIL models.
arXiv Detail & Related papers (2024-11-15T02:38:00Z) - Unsupervised Model Diagnosis [49.36194740479798]
This paper proposes Unsupervised Model Diagnosis (UMO) to produce semantic counterfactual explanations without any user guidance.
Our approach identifies and visualizes changes in semantics, and then matches these changes to attributes from wide-ranging text sources.
arXiv Detail & Related papers (2024-10-08T17:59:03Z) - Probing Fine-Grained Action Understanding and Cross-View Generalization of Foundation Models [13.972809192907931]
Foundation models (FMs) are large neural networks trained on broad datasets.
Human activity recognition in video has advanced with FMs, driven by competition among different architectures.
This paper empirically evaluates how perspective changes affect different FMs in fine-grained human activity recognition.
arXiv Detail & Related papers (2024-07-22T12:59:57Z) - Reinforcing Pre-trained Models Using Counterfactual Images [54.26310919385808]
This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images.
We identify model weaknesses by testing the model using the counterfactual image dataset.
We employ the counterfactual images as an augmented dataset to fine-tune and reinforce the classification model.
arXiv Detail & Related papers (2024-06-19T08:07:14Z) - Corpus Considerations for Annotator Modeling and Scaling [9.263562546969695]
We show that the commonly used user token model consistently outperforms more complex models.
Our findings shed light on the relationship between corpus statistics and annotator modeling performance.
arXiv Detail & Related papers (2024-04-02T22:27:24Z) - Language Guided Domain Generalized Medical Image Segmentation [68.93124785575739]
Single source domain generalization holds promise for more reliable and consistent image segmentation across real-world clinical settings.
We propose an approach that explicitly leverages textual information by incorporating a contrastive learning mechanism guided by the text encoder features.
Our approach achieves favorable performance against existing methods in literature.
arXiv Detail & Related papers (2024-04-01T17:48:15Z) - Harnessing Diffusion Models for Visual Perception with Meta Prompts [68.78938846041767]
We propose a simple yet effective scheme to harness a diffusion model for visual perception tasks.
We introduce learnable embeddings (meta prompts) to the pre-trained diffusion models to extract proper features for perception.
Our approach achieves new performance records in depth estimation tasks on NYU depth V2 and KITTI, and in semantic segmentation task on CityScapes.
arXiv Detail & Related papers (2023-12-22T14:40:55Z) - The Importance of Downstream Networks in Digital Pathology Foundation Models [1.689369173057502]
We evaluate seven feature extractor models across three different datasets with 162 different aggregation model configurations.
We find that the performance of many current feature extractor models is notably similar.
arXiv Detail & Related papers (2023-11-29T16:54:25Z) - UniDiff: Advancing Vision-Language Models with Generative and
Discriminative Learning [86.91893533388628]
This paper presents UniDiff, a unified multi-modal model that integrates image-text contrastive learning (ITC), text-conditioned image synthesis learning (IS), and reciprocal semantic consistency modeling (RSC)
UniDiff demonstrates versatility in both multi-modal understanding and generative tasks.
arXiv Detail & Related papers (2023-06-01T15:39:38Z) - Probabilistic Tracking with Deep Factors [8.030212474745879]
We show how to use a deep feature encoding in conjunction with generative densities over the features in a factor-graph based, probabilistic tracking framework.
We present a likelihood model that combines a learned feature encoder with generative densities over them, both trained in a supervised manner.
arXiv Detail & Related papers (2021-12-02T21:31:51Z) - Multi-Branch Deep Radial Basis Function Networks for Facial Emotion
Recognition [80.35852245488043]
We propose a CNN based architecture enhanced with multiple branches formed by radial basis function (RBF) units.
RBF units capture local patterns shared by similar instances using an intermediate representation.
We show it is the incorporation of local information what makes the proposed model competitive.
arXiv Detail & Related papers (2021-09-07T21:05:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.