Related papers: Cross Language Image Matching for Weakly Supervised Semantic Segmentation

Cross Language Image Matching for Weakly Supervised Semantic Segmentation

URL: http://arxiv.org/abs/2203.02668v1
Date: Sat, 5 Mar 2022 06:39:48 GMT
Title: Cross Language Image Matching for Weakly Supervised Semantic Segmentation
Authors: Jinheng Xie, Xianxu Hou, Kai Ye, Linlin Shen
Abstract summary: We propose a novel Cross Language Image Matching (CLIMS) framework, based on the Contrastive Language-Image Pre-training (CLIP) model. The core idea of our framework is to introduce natural language supervision to activate more complete object regions and suppress closely-related open background regions. In addition, we design a co-occurring background suppression loss to prevent the model from activating closely-related background regions.
Score: 26.04918485403939
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: It has been widely known that CAM (Class Activation Map) usually only activates discriminative object regions and falsely includes lots of object-related backgrounds. As only a fixed set of image-level object labels are available to the WSSS (weakly supervised semantic segmentation) model, it could be very difficult to suppress those diverse background regions consisting of open set objects. In this paper, we propose a novel Cross Language Image Matching (CLIMS) framework, based on the recently introduced Contrastive Language-Image Pre-training (CLIP) model, for WSSS. The core idea of our framework is to introduce natural language supervision to activate more complete object regions and suppress closely-related open background regions. In particular, we design object, background region and text label matching losses to guide the model to excite more reasonable object regions for CAM of each category. In addition, we design a co-occurring background suppression loss to prevent the model from activating closely-related background regions, with a predefined set of class-related background text descriptions. These designs enable the proposed CLIMS to generate a more complete and compact activation map for the target objects. Extensive experiments on PASCAL VOC2012 dataset show that our CLIMS significantly outperforms the previous state-of-the-art methods. Code will be available.

Related papers

SemPLeS: Semantic Prompt Learning for Weakly-Supervised Semantic Segmentation [36.41778553250247]
Weakly-Supervised Semantic (WSSS) aims to train segmentation models using image data with only image-level supervision. We propose a Semantic Prompt Learning for WSSS (SemPLeS) framework, which learns to effectively prompt the CLIP latent space. SemPLeS can perform better semantic alignment between object regions and the associated class labels.
arXiv Detail & Related papers (2024-01-22T09:41:05Z)
Spatial Structure Constraints for Weakly Supervised Semantic Segmentation [100.0316479167605]
A class activation map (CAM) can only locate the most discriminative part of objects. We propose spatial structure constraints (SSC) for weakly supervised semantic segmentation to alleviate the unwanted object over-activation of attention expansion. Our approach achieves 72.7% and 47.0% mIoU on the PASCAL VOC 2012 and COCO datasets, respectively.
arXiv Detail & Related papers (2024-01-20T05:25:25Z)
Question-Answer Cross Language Image Matching for Weakly Supervised Semantic Segmentation [37.15828464616587]
Class Activation Map (CAM) has emerged as a popular tool for weakly supervised semantic segmentation. We propose a novel Question-Answer Cross-Language-Image Matching framework for WSSS (QA-CLIMS)
arXiv Detail & Related papers (2024-01-18T10:55:13Z)
Background Activation Suppression for Weakly Supervised Object Localization and Semantic Segmentation [84.62067728093358]
Weakly supervised object localization and semantic segmentation aim to localize objects using only image-level labels. New paradigm has emerged by generating a foreground prediction map to achieve pixel-level localization. This paper presents two astonishing experimental observations on the object localization learning process.
arXiv Detail & Related papers (2023-09-22T15:44:10Z)
LAW-Diffusion: Complex Scene Generation by Diffusion with Layouts [107.11267074981905]
We propose a semantically controllable layout-AWare diffusion model, termed LAW-Diffusion. We show that LAW-Diffusion yields the state-of-the-art generative performance, especially with coherent object relations.
arXiv Detail & Related papers (2023-08-13T08:06:18Z)
Saliency Guided Inter- and Intra-Class Relation Constraints for Weakly Supervised Semantic Segmentation [66.87777732230884]
We propose a saliency guided Inter- and Intra-Class Relation Constrained (I$2$CRC) framework to assist the expansion of the activated object regions. We also introduce an object guided label refinement module to take a full use of both the segmentation prediction and the initial labels for obtaining superior pseudo-labels.
arXiv Detail & Related papers (2022-06-20T03:40:56Z)
Self-Supervised Video Object Segmentation via Cutout Prediction and Tagging [117.73967303377381]
We propose a novel self-supervised Video Object (VOS) approach that strives to achieve better object-background discriminability. Our approach is based on a discriminative learning loss formulation that takes into account both object and background information. Our proposed approach, CT-VOS, achieves state-of-the-art results on two challenging benchmarks: DAVIS-2017 and Youtube-VOS.
arXiv Detail & Related papers (2022-04-22T17:53:27Z)
Anti-Adversarially Manipulated Attributions for Weakly Supervised Semantic Segmentation and Object Localization [31.69344455448125]
We present an attribution map of an image that is manipulated to increase the classification score produced by a classifier before the final softmax or sigmoid layer. This manipulation is realized in an anti-adversarial manner, so that the original image is perturbed along pixel gradients in directions opposite to those used in an adversarial attack. In addition, we introduce a new regularization procedure that inhibits the incorrect attribution of regions unrelated to the target object and the excessive concentration of attributions on a small region of the target object.
arXiv Detail & Related papers (2022-04-11T06:18:02Z)
Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation [32.76127086403596]
We propose Contrastive learning for Class-agnostic Activation Map (C$2$AM) generation using unlabeled image data. We form the positive and negative pairs based on the above relations and force the network to disentangle foreground and background. As the network is guided to discriminate cross-image foreground-background, the class-agnostic activation maps learned by our approach generate more complete object regions.
arXiv Detail & Related papers (2022-03-25T08:46:24Z)
Robust Region Feature Synthesizer for Zero-Shot Object Detection [87.79902339984142]
We build a novel zero-shot object detection framework that contains an Intra-class Semantic Diverging component and an Inter-class Structure Preserving component. It is the first study to carry out zero-shot object detection in remote sensing imagery.
arXiv Detail & Related papers (2022-01-01T03:09:15Z)
Cross-Image Region Mining with Region Prototypical Network for Weakly Supervised Segmentation [45.39679291105364]
We propose a region network RPNet to explore the cross-image object diversity of the training set. Similar object parts across images are identified via region feature comparison. Experiments show that the proposed method generates more complete and accurate pseudo object masks.
arXiv Detail & Related papers (2021-08-17T02:51:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.