Contrastive learning of Class-agnostic Activation Map for Weakly
Supervised Object Localization and Semantic Segmentation
- URL: http://arxiv.org/abs/2203.13505v1
- Date: Fri, 25 Mar 2022 08:46:24 GMT
- Title: Contrastive learning of Class-agnostic Activation Map for Weakly
Supervised Object Localization and Semantic Segmentation
- Authors: Jinheng Xie, Jianfeng Xiang, Junliang Chen, Xianxu Hou, Xiaodong Zhao,
Linlin Shen
- Abstract summary: We propose Contrastive learning for Class-agnostic Activation Map (C$2$AM) generation using unlabeled image data.
We form the positive and negative pairs based on the above relations and force the network to disentangle foreground and background.
As the network is guided to discriminate cross-image foreground-background, the class-agnostic activation maps learned by our approach generate more complete object regions.
- Score: 32.76127086403596
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While class activation map (CAM) generated by image classification network
has been widely used for weakly supervised object localization (WSOL) and
semantic segmentation (WSSS), such classifiers usually focus on discriminative
object regions. In this paper, we propose Contrastive learning for
Class-agnostic Activation Map (C$^2$AM) generation only using unlabeled image
data, without the involvement of image-level supervision. The core idea comes
from the observation that i) semantic information of foreground objects usually
differs from their backgrounds; ii) foreground objects with similar appearance
or background with similar color/texture have similar representations in the
feature space. We form the positive and negative pairs based on the above
relations and force the network to disentangle foreground and background with a
class-agnostic activation map using a novel contrastive loss. As the network is
guided to discriminate cross-image foreground-background, the class-agnostic
activation maps learned by our approach generate more complete object regions.
We successfully extracted from C$^2$AM class-agnostic object bounding boxes for
object localization and background cues to refine CAM generated by
classification network for semantic segmentation. Extensive experiments on
CUB-200-2011, ImageNet-1K, and PASCAL VOC2012 datasets show that both WSOL and
WSSS can benefit from the proposed C$^2$AM.
Related papers
- Question-Answer Cross Language Image Matching for Weakly Supervised
Semantic Segmentation [37.15828464616587]
Class Activation Map (CAM) has emerged as a popular tool for weakly supervised semantic segmentation.
We propose a novel Question-Answer Cross-Language-Image Matching framework for WSSS (QA-CLIMS)
arXiv Detail & Related papers (2024-01-18T10:55:13Z) - Saliency Guided Inter- and Intra-Class Relation Constraints for Weakly
Supervised Semantic Segmentation [66.87777732230884]
We propose a saliency guided Inter- and Intra-Class Relation Constrained (I$2$CRC) framework to assist the expansion of the activated object regions.
We also introduce an object guided label refinement module to take a full use of both the segmentation prediction and the initial labels for obtaining superior pseudo-labels.
arXiv Detail & Related papers (2022-06-20T03:40:56Z) - Cross Language Image Matching for Weakly Supervised Semantic
Segmentation [26.04918485403939]
We propose a novel Cross Language Image Matching (CLIMS) framework, based on the Contrastive Language-Image Pre-training (CLIP) model.
The core idea of our framework is to introduce natural language supervision to activate more complete object regions and suppress closely-related open background regions.
In addition, we design a co-occurring background suppression loss to prevent the model from activating closely-related background regions.
arXiv Detail & Related papers (2022-03-05T06:39:48Z) - Background-aware Classification Activation Map for Weakly Supervised
Object Localization [14.646874544729426]
We propose a background-aware classification activation map (B-CAM) to simultaneously learn localization scores of both object and background.
Our B-CAM can be trained in end-to-end manner based on a proposed stagger classification loss.
Experiments show that our B-CAM outperforms one-stage WSOL methods on the CUB-200, OpenImages and VOC2012 datasets.
arXiv Detail & Related papers (2021-12-29T03:12:09Z) - Cross-Image Region Mining with Region Prototypical Network for Weakly
Supervised Segmentation [45.39679291105364]
We propose a region network RPNet to explore the cross-image object diversity of the training set.
Similar object parts across images are identified via region feature comparison.
Experiments show that the proposed method generates more complete and accurate pseudo object masks.
arXiv Detail & Related papers (2021-08-17T02:51:02Z) - Rectifying the Shortcut Learning of Background: Shared Object
Concentration for Few-Shot Image Recognition [101.59989523028264]
Few-Shot image classification aims to utilize pretrained knowledge learned from a large-scale dataset to tackle a series of downstream classification tasks.
We propose COSOC, a novel Few-Shot Learning framework, to automatically figure out foreground objects at both pretraining and evaluation stage.
arXiv Detail & Related papers (2021-07-16T07:46:41Z) - Locate then Segment: A Strong Pipeline for Referring Image Segmentation [73.19139431806853]
Referring image segmentation aims to segment the objects referred by a natural language expression.
Previous methods usually focus on designing an implicit and recurrent interaction mechanism to fuse the visual-linguistic features to directly generate the final segmentation mask.
We present a "Then-Then-Segment" scheme to tackle these problems.
Our framework is simple but surprisingly effective.
arXiv Detail & Related papers (2021-03-30T12:25:27Z) - TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised
Object Localization [112.46381729542658]
Weakly supervised object localization (WSOL) is a challenging problem when given image category labels.
We introduce the token semantic coupled attention map (TS-CAM) to take full advantage of the self-attention mechanism in visual transformer for long-range dependency extraction.
arXiv Detail & Related papers (2021-03-27T09:43:16Z) - Weakly-Supervised Semantic Segmentation via Sub-category Exploration [73.03956876752868]
We propose a simple yet effective approach to enforce the network to pay attention to other parts of an object.
Specifically, we perform clustering on image features to generate pseudo sub-categories labels within each annotated parent class.
We conduct extensive analysis to validate the proposed method and show that our approach performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2020-08-03T20:48:31Z) - Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation [128.03739769844736]
Two neural co-attentions are incorporated into the classifier to capture cross-image semantic similarities and differences.
In addition to boosting object pattern learning, the co-attention can leverage context from other related images to improve localization map inference.
Our algorithm sets new state-of-the-arts on all these settings, demonstrating well its efficacy and generalizability.
arXiv Detail & Related papers (2020-07-03T21:53:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.