One Explanation is Not Enough: Structured Attention Graphs for Image
Classification
- URL: http://arxiv.org/abs/2011.06733v4
- Date: Sun, 7 Nov 2021 09:58:28 GMT
- Title: One Explanation is Not Enough: Structured Attention Graphs for Image
Classification
- Authors: Vivswan Shitole, Li Fuxin, Minsuk Kahng, Prasad Tadepalli, Alan Fern
- Abstract summary: We introduce structured attention graphs (SAGs), which compactly represent sets of attention maps for an image.
We propose an approach to compute SAGs and a visualization for SAGs so that deeper insight can be gained into a classifier's decisions.
- Score: 30.640374946654806
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Attention maps are a popular way of explaining the decisions of convolutional
networks for image classification. Typically, for each image of interest, a
single attention map is produced, which assigns weights to pixels based on
their importance to the classification. A single attention map, however,
provides an incomplete understanding since there are often many other maps that
explain a classification equally well. In this paper, we introduce structured
attention graphs (SAGs), which compactly represent sets of attention maps for
an image by capturing how different combinations of image regions impact a
classifier's confidence. We propose an approach to compute SAGs and a
visualization for SAGs so that deeper insight can be gained into a classifier's
decisions. We conduct a user study comparing the use of SAGs to traditional
attention maps for answering counterfactual questions about image
classifications. Our results show that the users are more correct when
answering comparative counterfactual questions based on SAGs compared to the
baselines.
Related papers
- Aggregating Local Saliency Maps for Semi-Global Explainable Image Classification [0.0]
Deep learning dominates image classification tasks, yet understanding how models arrive at predictions remains a challenge.<n>Much research focuses on local explanations of individual predictions, such as saliency maps, which visualise the influence of specific pixels on a model's prediction.<n>We propose Segment Attribution Tables (SATs), a method for summarising local saliency explanations into (semi-)global insights.
arXiv Detail & Related papers (2025-06-29T14:11:02Z) - Spatial Action Unit Cues for Interpretable Deep Facial Expression Recognition [55.97779732051921]
State-of-the-art classifiers for facial expression recognition (FER) lack interpretability, an important feature for end-users.
A new learning strategy is proposed to explicitly incorporate AU cues into classifier training, allowing to train deep interpretable models.
Our new strategy is generic, and can be applied to any deep CNN- or transformer-based classifier without requiring any architectural change or significant additional training time.
arXiv Detail & Related papers (2024-10-01T10:42:55Z) - Guided Interpretable Facial Expression Recognition via Spatial Action Unit Cues [55.97779732051921]
A new learning strategy is proposed to explicitly incorporate au cues into classifier training.
We show that our strategy can improve layer-wise interpretability without degrading classification performance.
arXiv Detail & Related papers (2024-02-01T02:13:49Z) - A Self-Training Framework Based on Multi-Scale Attention Fusion for
Weakly Supervised Semantic Segmentation [7.36778096476552]
We propose a self-training method that utilizes fused multi-scale class-aware attention maps.
We collect information from attention maps of different scales and obtain multi-scale attention maps.
We then apply denoising and reactivation strategies to enhance the potential regions and reduce noisy areas.
arXiv Detail & Related papers (2023-05-10T02:16:12Z) - Symbolic image detection using scene and knowledge graphs [39.49756199669471]
We use a scene graph, a graph representation of an image, to capture visual components.
We generate a knowledge graph using facts extracted from ConceptNet to reason about objects and attributes.
We extend the network further to use an attention mechanism which learn the importance of the graph on representations.
arXiv Detail & Related papers (2022-06-10T04:06:28Z) - LEAD: Self-Supervised Landmark Estimation by Aligning Distributions of
Feature Similarity [49.84167231111667]
Existing works in self-supervised landmark detection are based on learning dense (pixel-level) feature representations from an image.
We introduce an approach to enhance the learning of dense equivariant representations in a self-supervised fashion.
We show that having such a prior in the feature extractor helps in landmark detection, even under drastically limited number of annotations.
arXiv Detail & Related papers (2022-04-06T17:48:18Z) - Grafit: Learning fine-grained image representations with coarse labels [114.17782143848315]
This paper tackles the problem of learning a finer representation than the one provided by training labels.
By jointly leveraging the coarse labels and the underlying fine-grained latent space, it significantly improves the accuracy of category-level retrieval methods.
arXiv Detail & Related papers (2020-11-25T19:06:26Z) - Weakly-Supervised Semantic Segmentation via Sub-category Exploration [73.03956876752868]
We propose a simple yet effective approach to enforce the network to pay attention to other parts of an object.
Specifically, we perform clustering on image features to generate pseudo sub-categories labels within each annotated parent class.
We conduct extensive analysis to validate the proposed method and show that our approach performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2020-08-03T20:48:31Z) - Zero-Shot Recognition through Image-Guided Semantic Classification [9.291055558504588]
We present a new embedding-based framework for zero-shot learning (ZSL)
Motivated by the binary relevance method for multi-label classification, we propose to inversely learn the mapping between an image and a semantic classifier.
IGSC is conceptually simple and can be realized by a slight enhancement of an existing deep architecture for classification.
arXiv Detail & Related papers (2020-07-23T06:22:40Z) - Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation [128.03739769844736]
Two neural co-attentions are incorporated into the classifier to capture cross-image semantic similarities and differences.
In addition to boosting object pattern learning, the co-attention can leverage context from other related images to improve localization map inference.
Our algorithm sets new state-of-the-arts on all these settings, demonstrating well its efficacy and generalizability.
arXiv Detail & Related papers (2020-07-03T21:53:46Z) - Manifold-driven Attention Maps for Weakly Supervised Segmentation [9.289524646688244]
We propose a manifold driven attention-based network to enhance visual salient regions.
Our method generates superior attention maps directly during inference without the need of extra computations.
arXiv Detail & Related papers (2020-04-07T00:03:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.