Attention-based Class Activation Diffusion for Weakly-Supervised
Semantic Segmentation
- URL: http://arxiv.org/abs/2211.10931v1
- Date: Sun, 20 Nov 2022 10:06:32 GMT
- Title: Attention-based Class Activation Diffusion for Weakly-Supervised
Semantic Segmentation
- Authors: Jianqiang Huang, Jian Wang, Qianru Sun and Hanwang Zhang
- Abstract summary: extracting class activation maps (CAM) is a key step for weakly-supervised semantic segmentation (WSSS)
This paper proposes a new method to couple CAM and Attention matrix in a probabilistic Diffusion way, and dub it AD-CAM.
Experiments show that AD-CAM as pseudo labels can yield stronger WSSS models than the state-of-the-art variants of CAM.
- Score: 98.306533433627
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Extracting class activation maps (CAM) is a key step for weakly-supervised
semantic segmentation (WSSS). The CAM of convolution neural networks fails to
capture long-range feature dependency on the image and result in the coverage
on only foreground object parts, i.e., a lot of false negatives. An intuitive
solution is ``coupling'' the CAM with the long-range attention matrix of visual
transformers (ViT) We find that the direct ``coupling'', e.g., pixel-wise
multiplication of attention and activation, achieves a more global coverage (on
the foreground), but unfortunately goes with a great increase of false
positives, i.e., background pixels are mistakenly included. This paper aims to
tackle this issue. It proposes a new method to couple CAM and Attention matrix
in a probabilistic Diffusion way, and dub it AD-CAM. Intuitively, it integrates
ViT attention and CAM activation in a conservative and convincing way.
Conservative is achieved by refining the attention between a pair of pixels
based on their respective attentions to common neighbors, where the intuition
is two pixels having very different neighborhoods are rarely dependent, i.e.,
their attention should be reduced. Convincing is achieved by diffusing a
pixel's activation to its neighbors (on the CAM) in proportion to the
corresponding attentions (on the AM). In experiments, our results on two
challenging WSSS benchmarks PASCAL VOC and MS~COCO show that AD-CAM as pseudo
labels can yield stronger WSSS models than the state-of-the-art variants of
CAM.
Related papers
- All-pairs Consistency Learning for Weakly Supervised Semantic
Segmentation [42.66269050864235]
We propose a new transformer-based regularization to better localize objects for Weakly supervised semantic segmentation (WSSS)
We adopt vision transformers as the self-attention mechanism naturally embeds pair-wise affinity.
Our method produces noticeably better class localization maps (67.3% mIoU on PASCAL VOC train)
arXiv Detail & Related papers (2023-08-08T15:14:23Z) - Importance Sampling CAMs for Weakly-Supervised Segmentation [16.86352815414646]
Class activation maps (CAMs) can be used to localize and segment objects in images by means of class activation maps (CAMs)
In this work, we approach both problems with two contributions for improving CAM learning.
We conduct experiments on the PASCAL VOC 2012 benchmark dataset to demonstrate that these modifications significantly increase the performance in terms of contour accuracy.
arXiv Detail & Related papers (2022-03-23T14:54:29Z) - Self-supervised Image-specific Prototype Exploration for Weakly
Supervised Semantic Segmentation [72.33139350241044]
Weakly Supervised Semantic COCO (WSSS) based on image-level labels has attracted much attention due to low annotation costs.
We propose a Self-supervised Image-specific Prototype Exploration (SIPE) that consists of an Image-specific Prototype Exploration (IPE) and a General-Specific Consistency (GSC) loss.
Our SIPE achieves new state-of-the-art performance using only image-level labels.
arXiv Detail & Related papers (2022-03-06T09:01:03Z) - Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation [88.55040177178442]
Class activation maps (CAM) is arguably the most standard step of generating pseudo masks for semantic segmentation.
Yet, the crux of the unsatisfactory pseudo masks is the binary cross-entropy loss (BCE) widely used in CAM.
We introduce an embarrassingly simple yet surprisingly effective method: Reactivating the converged CAM with BCE by using softmax cross-entropy loss (SCE)
The evaluation on both PASCAL VOC and MSCOCO shows that ReCAM not only generates high-quality masks, but also supports plug-and-play in any CAM variant with little overhead.
arXiv Detail & Related papers (2022-03-02T09:14:58Z) - PCAM: Product of Cross-Attention Matrices for Rigid Registration of
Point Clouds [79.99653758293277]
PCAM is a neural network whose key element is a pointwise product of cross-attention matrices.
We show that PCAM achieves state-of-the-art results among methods which, like us, solve steps (a) and (b) jointly via deepnets.
arXiv Detail & Related papers (2021-10-04T09:23:27Z) - TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised
Object Localization [112.46381729542658]
Weakly supervised object localization (WSOL) is a challenging problem when given image category labels.
We introduce the token semantic coupled attention map (TS-CAM) to take full advantage of the self-attention mechanism in visual transformer for long-range dependency extraction.
arXiv Detail & Related papers (2021-03-27T09:43:16Z) - Puzzle-CAM: Improved localization via matching partial and full features [0.5482532589225552]
Weakly-supervised semantic segmentation (WSSS) is introduced to narrow the gap for semantic segmentation performance from pixel-level supervision to image-level supervision.
Most advanced approaches are based on class activation maps (CAMs) to generate pseudo-labels to train the segmentation network.
We propose Puzzle-CAM, a process that minimizes differences between the features from separate patches and the whole image.
In experiments, Puzzle-CAM outperformed previous state-of-the-art methods using the same labels for supervision on the PASCAL VOC 2012 dataset.
arXiv Detail & Related papers (2021-01-27T08:19:38Z) - Self-supervised Equivariant Attention Mechanism for Weakly Supervised
Semantic Segmentation [93.83369981759996]
We propose a self-supervised equivariant attention mechanism (SEAM) to discover additional supervision and narrow the gap.
Our method is based on the observation that equivariance is an implicit constraint in fully supervised semantic segmentation.
We propose consistency regularization on predicted CAMs from various transformed images to provide self-supervision for network learning.
arXiv Detail & Related papers (2020-04-09T14:57:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.