Finer-CAM: Spotting the Difference Reveals Finer Details for Visual Explanation
- URL: http://arxiv.org/abs/2501.11309v1
- Date: Mon, 20 Jan 2025 07:23:11 GMT
- Title: Finer-CAM: Spotting the Difference Reveals Finer Details for Visual Explanation
- Authors: Ziheng Zhang, Jianyang Gu, Arpita Chowdhury, Zheda Mai, David Carlyn, Tanya Berger-Wolf, Yu Su, Wei-Lun Chao,
- Abstract summary: Class activation map (CAM) has been widely used to highlight image regions that contribute to class predictions.
We propose Finer-CAM, a method that retains CAM's efficiency while achieving precise localization of discriminative regions.
- Score: 31.60751962128398
- License:
- Abstract: Class activation map (CAM) has been widely used to highlight image regions that contribute to class predictions. Despite its simplicity and computational efficiency, CAM often struggles to identify discriminative regions that distinguish visually similar fine-grained classes. Prior efforts address this limitation by introducing more sophisticated explanation processes, but at the cost of extra complexity. In this paper, we propose Finer-CAM, a method that retains CAM's efficiency while achieving precise localization of discriminative regions. Our key insight is that the deficiency of CAM lies not in "how" it explains, but in "what" it explains}. Specifically, previous methods attempt to identify all cues contributing to the target class's logit value, which inadvertently also activates regions predictive of visually similar classes. By explicitly comparing the target class with similar classes and spotting their differences, Finer-CAM suppresses features shared with other classes and emphasizes the unique, discriminative details of the target class. Finer-CAM is easy to implement, compatible with various CAM methods, and can be extended to multi-modal models for accurate localization of specific concepts. Additionally, Finer-CAM allows adjustable comparison strength, enabling users to selectively highlight coarse object contours or fine discriminative details. Quantitatively, we show that masking out the top 5% of activated pixels by Finer-CAM results in a larger relative confidence drop compared to baselines. The source code and demo are available at https://github.com/Imageomics/Finer-CAM.
Related papers
- BroadCAM: Outcome-agnostic Class Activation Mapping for Small-scale
Weakly Supervised Applications [69.22739434619531]
We propose an outcome-agnostic CAM approach, called BroadCAM, for small-scale weakly supervised applications.
By evaluating BroadCAM on VOC2012 and BCSS-WSSS for WSSS and OpenImages30k for WSOL, BroadCAM demonstrates superior performance.
arXiv Detail & Related papers (2023-09-07T06:45:43Z) - Extracting Class Activation Maps from Non-Discriminative Features as
well [23.968856513180032]
Class activation maps (CAM) from a classification model often results in poor coverage on foreground objects.
We introduce a new computation method for CAM that explicitly captures non-discriminative features as well.
We call the resultant K cluster centers local prototypes - represent local semantics like the "head", "leg", and "body" of "sheep"
arXiv Detail & Related papers (2023-03-18T04:47:42Z) - Attention-based Class Activation Diffusion for Weakly-Supervised
Semantic Segmentation [98.306533433627]
extracting class activation maps (CAM) is a key step for weakly-supervised semantic segmentation (WSSS)
This paper proposes a new method to couple CAM and Attention matrix in a probabilistic Diffusion way, and dub it AD-CAM.
Experiments show that AD-CAM as pseudo labels can yield stronger WSSS models than the state-of-the-art variants of CAM.
arXiv Detail & Related papers (2022-11-20T10:06:32Z) - Saliency Guided Inter- and Intra-Class Relation Constraints for Weakly
Supervised Semantic Segmentation [66.87777732230884]
We propose a saliency guided Inter- and Intra-Class Relation Constrained (I$2$CRC) framework to assist the expansion of the activated object regions.
We also introduce an object guided label refinement module to take a full use of both the segmentation prediction and the initial labels for obtaining superior pseudo-labels.
arXiv Detail & Related papers (2022-06-20T03:40:56Z) - FD-CAM: Improving Faithfulness and Discriminability of Visual
Explanation for CNNs [7.956110316017118]
Class activation map (CAM) has been widely studied for visual explanation of the internal working mechanism of convolutional neural networks.
We propose a novel CAM weighting scheme, named FD-CAM, to improve both the faithfulness and discriminability of the CNN visual explanation.
arXiv Detail & Related papers (2022-06-17T14:08:39Z) - Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation [88.55040177178442]
Class activation maps (CAM) is arguably the most standard step of generating pseudo masks for semantic segmentation.
Yet, the crux of the unsatisfactory pseudo masks is the binary cross-entropy loss (BCE) widely used in CAM.
We introduce an embarrassingly simple yet surprisingly effective method: Reactivating the converged CAM with BCE by using softmax cross-entropy loss (SCE)
The evaluation on both PASCAL VOC and MSCOCO shows that ReCAM not only generates high-quality masks, but also supports plug-and-play in any CAM variant with little overhead.
arXiv Detail & Related papers (2022-03-02T09:14:58Z) - Towards Learning Spatially Discriminative Feature Representations [26.554140976236052]
We propose a novel loss function, termed as CAM-loss, to constrain the embedded feature maps with the class activation maps (CAMs)
CAM-loss drives the backbone to express the features of target category and suppress the features of non-target categories or background.
Experimental results show that CAM-loss is applicable to a variety of network structures and can be combined with mainstream regularization methods to improve the performance of image classification.
arXiv Detail & Related papers (2021-09-03T08:04:17Z) - TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised
Object Localization [112.46381729542658]
Weakly supervised object localization (WSOL) is a challenging problem when given image category labels.
We introduce the token semantic coupled attention map (TS-CAM) to take full advantage of the self-attention mechanism in visual transformer for long-range dependency extraction.
arXiv Detail & Related papers (2021-03-27T09:43:16Z) - Use HiResCAM instead of Grad-CAM for faithful explanations of
convolutional neural networks [89.56292219019163]
Explanation methods facilitate the development of models that learn meaningful concepts and avoid exploiting spurious correlations.
We illustrate a previously unrecognized limitation of the popular neural network explanation method Grad-CAM.
We propose HiResCAM, a class-specific explanation method that is guaranteed to highlight only the locations the model used to make each prediction.
arXiv Detail & Related papers (2020-11-17T19:26:14Z) - Axiom-based Grad-CAM: Towards Accurate Visualization and Explanation of
CNNs [29.731732363623713]
Class Activation Mapping (CAM) methods have been proposed to discover the connection between CNN's decision and image regions.
In this paper, we introduce two axioms -- Conservation and Sensitivity -- to the visualization paradigm of the CAM methods.
A dedicated Axiom-based Grad-CAM (XGrad-CAM) is proposed to satisfy these axioms as much as possible.
arXiv Detail & Related papers (2020-08-05T18:42:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.