Cross Language Image Matching for Weakly Supervised Semantic
Segmentation
- URL: http://arxiv.org/abs/2203.02668v1
- Date: Sat, 5 Mar 2022 06:39:48 GMT
- Title: Cross Language Image Matching for Weakly Supervised Semantic
Segmentation
- Authors: Jinheng Xie, Xianxu Hou, Kai Ye, Linlin Shen
- Abstract summary: We propose a novel Cross Language Image Matching (CLIMS) framework, based on the Contrastive Language-Image Pre-training (CLIP) model.
The core idea of our framework is to introduce natural language supervision to activate more complete object regions and suppress closely-related open background regions.
In addition, we design a co-occurring background suppression loss to prevent the model from activating closely-related background regions.
- Score: 26.04918485403939
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It has been widely known that CAM (Class Activation Map) usually only
activates discriminative object regions and falsely includes lots of
object-related backgrounds. As only a fixed set of image-level object labels
are available to the WSSS (weakly supervised semantic segmentation) model, it
could be very difficult to suppress those diverse background regions consisting
of open set objects. In this paper, we propose a novel Cross Language Image
Matching (CLIMS) framework, based on the recently introduced Contrastive
Language-Image Pre-training (CLIP) model, for WSSS. The core idea of our
framework is to introduce natural language supervision to activate more
complete object regions and suppress closely-related open background regions.
In particular, we design object, background region and text label matching
losses to guide the model to excite more reasonable object regions for CAM of
each category. In addition, we design a co-occurring background suppression
loss to prevent the model from activating closely-related background regions,
with a predefined set of class-related background text descriptions. These
designs enable the proposed CLIMS to generate a more complete and compact
activation map for the target objects. Extensive experiments on PASCAL VOC2012
dataset show that our CLIMS significantly outperforms the previous
state-of-the-art methods. Code will be available.
Related papers
- SemPLeS: Semantic Prompt Learning for Weakly-Supervised Semantic
Segmentation [36.41778553250247]
Weakly-Supervised Semantic (WSSS) aims to train segmentation models using image data with only image-level supervision.
We propose a Semantic Prompt Learning for WSSS (SemPLeS) framework, which learns to effectively prompt the CLIP latent space.
SemPLeS can perform better semantic alignment between object regions and the associated class labels.
arXiv Detail & Related papers (2024-01-22T09:41:05Z) - Spatial Structure Constraints for Weakly Supervised Semantic
Segmentation [100.0316479167605]
A class activation map (CAM) can only locate the most discriminative part of objects.
We propose spatial structure constraints (SSC) for weakly supervised semantic segmentation to alleviate the unwanted object over-activation of attention expansion.
Our approach achieves 72.7% and 47.0% mIoU on the PASCAL VOC 2012 and COCO datasets, respectively.
arXiv Detail & Related papers (2024-01-20T05:25:25Z) - Question-Answer Cross Language Image Matching for Weakly Supervised
Semantic Segmentation [37.15828464616587]
Class Activation Map (CAM) has emerged as a popular tool for weakly supervised semantic segmentation.
We propose a novel Question-Answer Cross-Language-Image Matching framework for WSSS (QA-CLIMS)
arXiv Detail & Related papers (2024-01-18T10:55:13Z) - Background Activation Suppression for Weakly Supervised Object
Localization and Semantic Segmentation [84.62067728093358]
Weakly supervised object localization and semantic segmentation aim to localize objects using only image-level labels.
New paradigm has emerged by generating a foreground prediction map to achieve pixel-level localization.
This paper presents two astonishing experimental observations on the object localization learning process.
arXiv Detail & Related papers (2023-09-22T15:44:10Z) - LAW-Diffusion: Complex Scene Generation by Diffusion with Layouts [107.11267074981905]
We propose a semantically controllable layout-AWare diffusion model, termed LAW-Diffusion.
We show that LAW-Diffusion yields the state-of-the-art generative performance, especially with coherent object relations.
arXiv Detail & Related papers (2023-08-13T08:06:18Z) - Saliency Guided Inter- and Intra-Class Relation Constraints for Weakly
Supervised Semantic Segmentation [66.87777732230884]
We propose a saliency guided Inter- and Intra-Class Relation Constrained (I$2$CRC) framework to assist the expansion of the activated object regions.
We also introduce an object guided label refinement module to take a full use of both the segmentation prediction and the initial labels for obtaining superior pseudo-labels.
arXiv Detail & Related papers (2022-06-20T03:40:56Z) - Self-Supervised Video Object Segmentation via Cutout Prediction and
Tagging [117.73967303377381]
We propose a novel self-supervised Video Object (VOS) approach that strives to achieve better object-background discriminability.
Our approach is based on a discriminative learning loss formulation that takes into account both object and background information.
Our proposed approach, CT-VOS, achieves state-of-the-art results on two challenging benchmarks: DAVIS-2017 and Youtube-VOS.
arXiv Detail & Related papers (2022-04-22T17:53:27Z) - Contrastive learning of Class-agnostic Activation Map for Weakly
Supervised Object Localization and Semantic Segmentation [32.76127086403596]
We propose Contrastive learning for Class-agnostic Activation Map (C$2$AM) generation using unlabeled image data.
We form the positive and negative pairs based on the above relations and force the network to disentangle foreground and background.
As the network is guided to discriminate cross-image foreground-background, the class-agnostic activation maps learned by our approach generate more complete object regions.
arXiv Detail & Related papers (2022-03-25T08:46:24Z) - Robust Region Feature Synthesizer for Zero-Shot Object Detection [87.79902339984142]
We build a novel zero-shot object detection framework that contains an Intra-class Semantic Diverging component and an Inter-class Structure Preserving component.
It is the first study to carry out zero-shot object detection in remote sensing imagery.
arXiv Detail & Related papers (2022-01-01T03:09:15Z) - Cross-Image Region Mining with Region Prototypical Network for Weakly
Supervised Segmentation [45.39679291105364]
We propose a region network RPNet to explore the cross-image object diversity of the training set.
Similar object parts across images are identified via region feature comparison.
Experiments show that the proposed method generates more complete and accurate pseudo object masks.
arXiv Detail & Related papers (2021-08-17T02:51:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.