LICO: Explainable Models with Language-Image Consistency
- URL: http://arxiv.org/abs/2310.09821v1
- Date: Sun, 15 Oct 2023 12:44:33 GMT
- Title: LICO: Explainable Models with Language-Image Consistency
- Authors: Yiming Lei, Zilong Li, Yangyang Li, Junping Zhang, Hongming Shan
- Abstract summary: This paper develops a Language-Image COnsistency model for explainable image classification, termed LICO.
We first establish a coarse global manifold structure alignment by minimizing the distance between the distributions of image and language features.
We then achieve fine-grained saliency maps by applying optimal transport (OT) theory to assign local feature maps with class-specific prompts.
- Score: 39.869639626266554
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Interpreting the decisions of deep learning models has been actively studied
since the explosion of deep neural networks. One of the most convincing
interpretation approaches is salience-based visual interpretation, such as
Grad-CAM, where the generation of attention maps depends merely on categorical
labels. Although existing interpretation methods can provide explainable
decision clues, they often yield partial correspondence between image and
saliency maps due to the limited discriminative information from one-hot
labels. This paper develops a Language-Image COnsistency model for explainable
image classification, termed LICO, by correlating learnable linguistic prompts
with corresponding visual features in a coarse-to-fine manner. Specifically, we
first establish a coarse global manifold structure alignment by minimizing the
distance between the distributions of image and language features. We then
achieve fine-grained saliency maps by applying optimal transport (OT) theory to
assign local feature maps with class-specific prompts. Extensive experimental
results on eight benchmark datasets demonstrate that the proposed LICO achieves
a significant improvement in generating more explainable attention maps in
conjunction with existing interpretation methods such as Grad-CAM. Remarkably,
LICO improves the classification performance of existing models without
introducing any computational overhead during inference. Source code is made
available at https://github.com/ymLeiFDU/LICO.
Related papers
- Interpretable Network Visualizations: A Human-in-the-Loop Approach for Post-hoc Explainability of CNN-based Image Classification [5.087579454836169]
State-of-the-art explainability methods generate saliency maps to show where a specific class is identified.
We introduce a post-hoc method that explains the entire feature extraction process of a Convolutional Neural Network.
We also show an approach to generate global explanations by aggregating labels across multiple images.
arXiv Detail & Related papers (2024-05-06T09:21:35Z) - Guided Interpretable Facial Expression Recognition via Spatial Action Unit Cues [55.97779732051921]
A new learning strategy is proposed to explicitly incorporate au cues into classifier training.
We show that our strategy can improve layer-wise interpretability without degrading classification performance.
arXiv Detail & Related papers (2024-02-01T02:13:49Z) - EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models [52.3015009878545]
We develop an image segmentor capable of generating fine-grained segmentation maps without any additional training.
Our framework identifies semantic correspondences between image pixels and spatial locations of low-dimensional feature maps.
In extensive experiments, the produced segmentation maps are demonstrated to be well delineated and capture detailed parts of the images.
arXiv Detail & Related papers (2024-01-22T07:34:06Z) - Question-Answer Cross Language Image Matching for Weakly Supervised
Semantic Segmentation [37.15828464616587]
Class Activation Map (CAM) has emerged as a popular tool for weakly supervised semantic segmentation.
We propose a novel Question-Answer Cross-Language-Image Matching framework for WSSS (QA-CLIMS)
arXiv Detail & Related papers (2024-01-18T10:55:13Z) - Feature Activation Map: Visual Explanation of Deep Learning Models for
Image Classification [17.373054348176932]
In this work, a post-hoc interpretation tool named feature activation map (FAM) is proposed.
FAM can interpret deep learning models without FC layers as a classifier.
Experiments conducted on ten deep learning models for few-shot image classification, contrastive learning image classification and image retrieval tasks demonstrate the effectiveness of the proposed FAM algorithm.
arXiv Detail & Related papers (2023-07-11T05:33:46Z) - Decom--CAM: Tell Me What You See, In Details! Feature-Level Interpretation via Decomposition Class Activation Map [23.71680014689873]
Class Activation Map (CAM) is widely used to interpret deep model predictions by highlighting object location.
This paper proposes a new two-stage interpretability method called the Decomposition Class Activation Map (Decom-CAM)
Our experiments demonstrate that the proposed Decom-CAM outperforms current state-of-the-art methods significantly.
arXiv Detail & Related papers (2023-05-27T14:33:01Z) - From Heatmaps to Structural Explanations of Image Classifiers [31.44267537307587]
The paper starts with describing the explainable neural network (XNN), which attempts to extract and visualize several high-level concepts purely from the deep network.
Realizing that an important missing piece is a reliable heatmap visualization tool, we have developed I-GOS and iGOS++.
Through the research process, we have learned much about insights in building deep network explanations.
arXiv Detail & Related papers (2021-09-13T23:39:57Z) - Spatial-spectral Hyperspectral Image Classification via Multiple Random
Anchor Graphs Ensemble Learning [88.60285937702304]
This paper proposes a novel spatial-spectral HSI classification method via multiple random anchor graphs ensemble learning (RAGE)
Firstly, the local binary pattern is adopted to extract the more descriptive features on each selected band, which preserves local structures and subtle changes of a region.
Secondly, the adaptive neighbors assignment is introduced in the construction of anchor graph, to reduce the computational complexity.
arXiv Detail & Related papers (2021-03-25T09:31:41Z) - Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation [49.90178055521207]
This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation.
We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths.
In particular, we devise a graph neural network (GNN) for group-wise semantic mining, wherein input images are represented as graph nodes.
arXiv Detail & Related papers (2020-12-09T12:40:13Z) - Region Comparison Network for Interpretable Few-shot Image
Classification [97.97902360117368]
Few-shot image classification has been proposed to effectively use only a limited number of labeled examples to train models for new classes.
We propose a metric learning based method named Region Comparison Network (RCN), which is able to reveal how few-shot learning works.
We also present a new way to generalize the interpretability from the level of tasks to categories.
arXiv Detail & Related papers (2020-09-08T07:29:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.