Spatial Structure Constraints for Weakly Supervised Semantic
Segmentation
- URL: http://arxiv.org/abs/2401.11122v1
- Date: Sat, 20 Jan 2024 05:25:25 GMT
- Title: Spatial Structure Constraints for Weakly Supervised Semantic
Segmentation
- Authors: Tao Chen, Yazhou Yao, Xingguo Huang, Zechao Li, Liqiang Nie and Jinhui
Tang
- Abstract summary: A class activation map (CAM) can only locate the most discriminative part of objects.
We propose spatial structure constraints (SSC) for weakly supervised semantic segmentation to alleviate the unwanted object over-activation of attention expansion.
Our approach achieves 72.7% and 47.0% mIoU on the PASCAL VOC 2012 and COCO datasets, respectively.
- Score: 100.0316479167605
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The image-level label has prevailed in weakly supervised semantic
segmentation tasks due to its easy availability. Since image-level labels can
only indicate the existence or absence of specific categories of objects,
visualization-based techniques have been widely adopted to provide object
location clues. Considering class activation maps (CAMs) can only locate the
most discriminative part of objects, recent approaches usually adopt an
expansion strategy to enlarge the activation area for more integral object
localization. However, without proper constraints, the expanded activation will
easily intrude into the background region. In this paper, we propose spatial
structure constraints (SSC) for weakly supervised semantic segmentation to
alleviate the unwanted object over-activation of attention expansion.
Specifically, we propose a CAM-driven reconstruction module to directly
reconstruct the input image from deep CAM features, which constrains the
diffusion of last-layer object attention by preserving the coarse spatial
structure of the image content. Moreover, we propose an activation
self-modulation module to refine CAMs with finer spatial structure details by
enhancing regional consistency. Without external saliency models to provide
background clues, our approach achieves 72.7\% and 47.0\% mIoU on the PASCAL
VOC 2012 and COCO datasets, respectively, demonstrating the superiority of our
proposed approach.
Related papers
- Knowledge Transfer with Simulated Inter-Image Erasing for Weakly Supervised Semantic Segmentation [28.233690786378393]
We propose a textbfKnowledge textbfTransfer with textbfSimulated Inter-Image textbfErasing (KTSE) approach for weakly supervised semantic segmentation.
arXiv Detail & Related papers (2024-07-03T02:54:33Z) - Background Activation Suppression for Weakly Supervised Object
Localization and Semantic Segmentation [84.62067728093358]
Weakly supervised object localization and semantic segmentation aim to localize objects using only image-level labels.
New paradigm has emerged by generating a foreground prediction map to achieve pixel-level localization.
This paper presents two astonishing experimental observations on the object localization learning process.
arXiv Detail & Related papers (2023-09-22T15:44:10Z) - Progressively Dual Prior Guided Few-shot Semantic Segmentation [57.37506990980975]
Few-shot semantic segmentation task aims at performing segmentation in query images with a few annotated support samples.
We propose a progressively dual prior guided few-shot semantic segmentation network.
arXiv Detail & Related papers (2022-11-20T16:19:47Z) - Saliency Guided Inter- and Intra-Class Relation Constraints for Weakly
Supervised Semantic Segmentation [66.87777732230884]
We propose a saliency guided Inter- and Intra-Class Relation Constrained (I$2$CRC) framework to assist the expansion of the activated object regions.
We also introduce an object guided label refinement module to take a full use of both the segmentation prediction and the initial labels for obtaining superior pseudo-labels.
arXiv Detail & Related papers (2022-06-20T03:40:56Z) - Anti-Adversarially Manipulated Attributions for Weakly Supervised
Semantic Segmentation and Object Localization [31.69344455448125]
We present an attribution map of an image that is manipulated to increase the classification score produced by a classifier before the final softmax or sigmoid layer.
This manipulation is realized in an anti-adversarial manner, so that the original image is perturbed along pixel gradients in directions opposite to those used in an adversarial attack.
In addition, we introduce a new regularization procedure that inhibits the incorrect attribution of regions unrelated to the target object and the excessive concentration of attributions on a small region of the target object.
arXiv Detail & Related papers (2022-04-11T06:18:02Z) - Cross Language Image Matching for Weakly Supervised Semantic
Segmentation [26.04918485403939]
We propose a novel Cross Language Image Matching (CLIMS) framework, based on the Contrastive Language-Image Pre-training (CLIP) model.
The core idea of our framework is to introduce natural language supervision to activate more complete object regions and suppress closely-related open background regions.
In addition, we design a co-occurring background suppression loss to prevent the model from activating closely-related background regions.
arXiv Detail & Related papers (2022-03-05T06:39:48Z) - Unveiling the Potential of Structure-Preserving for Weakly Supervised
Object Localization [71.79436685992128]
We propose a two-stage approach, termed structure-preserving activation (SPA), towards fully leveraging the structure information incorporated in convolutional features for WSOL.
In the first stage, a restricted activation module (RAM) is designed to alleviate the structure-missing issue caused by the classification network.
In the second stage, we propose a post-process approach, termed self-correlation map generating (SCG) module to obtain structure-preserving localization maps.
arXiv Detail & Related papers (2021-03-08T03:04:14Z) - Self-supervised Segmentation via Background Inpainting [96.10971980098196]
We introduce a self-supervised detection and segmentation approach that can work with single images captured by a potentially moving camera.
We exploit a self-supervised loss function that we exploit to train a proposal-based segmentation network.
We apply our method to human detection and segmentation in images that visually depart from those of standard benchmarks and outperform existing self-supervised methods.
arXiv Detail & Related papers (2020-11-11T08:34:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.