Group-CAM: Group Score-Weighted Visual Explanations for Deep
Convolutional Networks
- URL: http://arxiv.org/abs/2103.13859v2
- Date: Fri, 26 Mar 2021 08:56:42 GMT
- Title: Group-CAM: Group Score-Weighted Visual Explanations for Deep
Convolutional Networks
- Authors: Qinglong Zhang, Lu Rao, Yubin Yang
- Abstract summary: We propose an efficient saliency map generation method, called Group score-weighted Class Activation Mapping (Group-CAM)
Group-CAM is efficient yet effective, which only requires dozens of queries to the network while producing target-related saliency maps.
- Score: 4.915848175689936
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we propose an efficient saliency map generation method, called
Group score-weighted Class Activation Mapping (Group-CAM), which adopts the
"split-transform-merge" strategy to generate saliency maps. Specifically, for
an input image, the class activations are firstly split into groups. In each
group, the sub-activations are summed and de-noised as an initial mask. After
that, the initial masks are transformed with meaningful perturbations and then
applied to preserve sub-pixels of the input (i.e., masked inputs), which are
then fed into the network to calculate the confidence scores. Finally, the
initial masks are weighted summed to form the final saliency map, where the
weights are confidence scores produced by the masked inputs. Group-CAM is
efficient yet effective, which only requires dozens of queries to the network
while producing target-related saliency maps. As a result, Group-CAM can be
served as an effective data augment trick for fine-tuning the networks. We
comprehensively evaluate the performance of Group-CAM on common-used
benchmarks, including deletion and insertion tests on ImageNet-1k, and pointing
game tests on COCO2017. Extensive experimental results demonstrate that
Group-CAM achieves better visual performance than the current state-of-the-art
explanation approaches. The code is available at
https://github.com/wofmanaf/Group-CAM.
Related papers
- Bridge the Points: Graph-based Few-shot Segment Anything Semantically [79.1519244940518]
Recent advancements in pre-training techniques have enhanced the capabilities of vision foundation models.
Recent studies extend the SAM to Few-shot Semantic segmentation (FSS)
We propose a simple yet effective approach based on graph analysis.
arXiv Detail & Related papers (2024-10-09T15:02:28Z) - Advancing Vision Transformers with Group-Mix Attention [59.585623293856735]
Group-Mix Attention (GMA) is an advanced replacement for traditional self-attention.
GMA simultaneously captures token-to-token, token-to-group, and group-to-group correlations with various group sizes.
GroupMixFormer achieves state-of-the-art performance in image classification, object detection, and semantic segmentation.
arXiv Detail & Related papers (2023-11-26T01:25:03Z) - Contrastive Grouping with Transformer for Referring Image Segmentation [23.276636282894582]
We propose a mask classification framework, Contrastive Grouping with Transformer network (CGFormer)
CGFormer explicitly captures object-level information via token-based querying and grouping strategy.
Experimental results demonstrate that CGFormer outperforms state-of-the-art methods in both segmentation and generalization settings consistently and significantly.
arXiv Detail & Related papers (2023-09-02T20:53:42Z) - HGFormer: Hierarchical Grouping Transformer for Domain Generalized
Semantic Segmentation [113.6560373226501]
This work studies semantic segmentation under the domain generalization setting.
We propose a novel hierarchical grouping transformer (HGFormer) to explicitly group pixels to form part-level masks and then whole-level masks.
Experiments show that HGFormer yields more robust semantic segmentation results than per-pixel classification methods and flat grouping transformers.
arXiv Detail & Related papers (2023-05-22T13:33:41Z) - High-fidelity Pseudo-labels for Boosting Weakly-Supervised Segmentation [17.804090651425955]
Image-level weakly-supervised segmentation (WSSS) reduces the usually vast data annotation cost by surrogate segmentation masks during training.
Our work is based on two techniques for improving CAMs; importance sampling, which is a substitute for GAP, and the feature similarity loss.
We reformulate both techniques based on binomial posteriors of multiple independent binary problems.
This has two benefits; their performance is improved and they become more general, resulting in an add-on method that can boost virtually any WSSS method.
arXiv Detail & Related papers (2023-04-05T17:43:57Z) - Attention-based Class Activation Diffusion for Weakly-Supervised
Semantic Segmentation [98.306533433627]
extracting class activation maps (CAM) is a key step for weakly-supervised semantic segmentation (WSSS)
This paper proposes a new method to couple CAM and Attention matrix in a probabilistic Diffusion way, and dub it AD-CAM.
Experiments show that AD-CAM as pseudo labels can yield stronger WSSS models than the state-of-the-art variants of CAM.
arXiv Detail & Related papers (2022-11-20T10:06:32Z) - Discovering Object Masks with Transformers for Unsupervised Semantic
Segmentation [75.00151934315967]
MaskDistill is a novel framework for unsupervised semantic segmentation.
Our framework does not latch onto low-level image cues and is not limited to object-centric datasets.
arXiv Detail & Related papers (2022-06-13T17:59:43Z) - Green Hierarchical Vision Transformer for Masked Image Modeling [54.14989750044489]
We present an efficient approach for Masked Image Modeling with hierarchical Vision Transformers (ViTs)
We design a Group Window Attention scheme following the Divide-and-Conquer strategy.
We further improve the grouping strategy via the Dynamic Programming algorithm to minimize the overall cost of the attention on the grouped patches.
arXiv Detail & Related papers (2022-05-26T17:34:42Z) - Dynamic Group Convolution for Accelerating Convolutional Neural Networks [23.644124360336754]
We propose dynamic group convolution (DGC) that adaptively selects which part of input channels to be connected within each group.
Multiple groups can adaptively capture abundant and complementary visual/semantic features for each input image.
The DGC preserves the original network structure and has similar computational efficiency as the conventional group convolution simultaneously.
arXiv Detail & Related papers (2020-07-08T16:35:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.