Exploiting Shape Cues for Weakly Supervised Semantic Segmentation
- URL: http://arxiv.org/abs/2208.04286v1
- Date: Mon, 8 Aug 2022 17:25:31 GMT
- Title: Exploiting Shape Cues for Weakly Supervised Semantic Segmentation
- Authors: Sungpil Kho, Pilhyeon Lee, Wonyoung Lee, Minsong Ki, Hyeran Byun
- Abstract summary: Weakly supervised semantic segmentation (WSSS) aims to produce pixel-wise class predictions with only image-level labels for training.
We propose to exploit shape information to supplement the texture-biased property of convolutional neural networks (CNNs)
We further refine the predictions in an online fashion with a novel refinement method that takes into account both the class and the color affinities.
- Score: 15.791415215216029
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Weakly supervised semantic segmentation (WSSS) aims to produce pixel-wise
class predictions with only image-level labels for training. To this end,
previous methods adopt the common pipeline: they generate pseudo masks from
class activation maps (CAMs) and use such masks to supervise segmentation
networks. However, it is challenging to derive comprehensive pseudo masks that
cover the whole extent of objects due to the local property of CAMs, i.e., they
tend to focus solely on small discriminative object parts. In this paper, we
associate the locality of CAMs with the texture-biased property of
convolutional neural networks (CNNs). Accordingly, we propose to exploit shape
information to supplement the texture-biased CNN features, thereby encouraging
mask predictions to be not only comprehensive but also well-aligned with object
boundaries. We further refine the predictions in an online fashion with a novel
refinement method that takes into account both the class and the color
affinities, in order to generate reliable pseudo masks to supervise the model.
Importantly, our model is end-to-end trained within a single-stage framework
and therefore efficient in terms of the training cost. Through extensive
experiments on PASCAL VOC 2012, we validate the effectiveness of our method in
producing precise and shape-aligned segmentation results. Specifically, our
model surpasses the existing state-of-the-art single-stage approaches by large
margins. What is more, it also achieves a new state-of-the-art performance over
multi-stage approaches, when adopted in a simple two-stage pipeline without
bells and whistles.
Related papers
- Semantic Refocused Tuning for Open-Vocabulary Panoptic Segmentation [42.020470627552136]
Open-vocabulary panoptic segmentation is an emerging task aiming to accurately segment the image into semantically meaningful masks.
mask classification is the main performance bottleneck for open-vocab panoptic segmentation.
We propose Semantic Refocused Tuning, a novel framework that greatly enhances open-vocab panoptic segmentation.
arXiv Detail & Related papers (2024-09-24T17:50:28Z) - LAC-Net: Linear-Fusion Attention-Guided Convolutional Network for Accurate Robotic Grasping Under the Occlusion [79.22197702626542]
This paper introduces a framework that explores amodal segmentation for robotic grasping in cluttered scenes.
We propose a Linear-fusion Attention-guided Convolutional Network (LAC-Net)
The results on different datasets show that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-08-06T14:50:48Z) - Boosting Unsupervised Semantic Segmentation with Principal Mask Proposals [15.258631373740686]
Unsupervised semantic segmentation aims to automatically partition images into semantically meaningful regions by identifying global semantic categories within an image corpus without any form of annotation.
We present PriMaPs - Principal Mask Proposals - decomposing images into semantically meaningful masks based on their feature representation.
This allows us to realize unsupervised semantic segmentation by fitting class prototypes to PriMaPs with a expectation-maximization algorithm, PriMaPs-EM.
arXiv Detail & Related papers (2024-04-25T17:58:09Z) - PosSAM: Panoptic Open-vocabulary Segment Anything [58.72494640363136]
PosSAM is an open-vocabulary panoptic segmentation model that unifies the strengths of the Segment Anything Model (SAM) with the vision-native CLIP model in an end-to-end framework.
We introduce a Mask-Aware Selective Ensembling (MASE) algorithm that adaptively enhances the quality of generated masks and boosts the performance of open-vocabulary classification during inference for each image.
arXiv Detail & Related papers (2024-03-14T17:55:03Z) - Delving into Shape-aware Zero-shot Semantic Segmentation [18.51025849474123]
We present textbfshape-aware zero-shot semantic segmentation.
Inspired by classical spectral methods, we propose to leverage the eigen vectors of Laplacian matrices constructed with self-supervised pixel-wise features.
Our method sets new state-of-the-art performance for zero-shot semantic segmentation on both Pascal and COCO.
arXiv Detail & Related papers (2023-04-17T17:59:46Z) - What You See is What You Classify: Black Box Attributions [61.998683569022006]
We train a deep network, the Explainer, to predict attributions for a pre-trained black-box classifier, the Explanandum.
Unlike most existing approaches, ours is capable of directly generating very distinct class-specific masks.
We show that our attributions are superior to established methods both visually and quantitatively.
arXiv Detail & Related papers (2022-05-23T12:30:04Z) - SSA: Semantic Structure Aware Inference for Weakly Pixel-Wise Dense
Predictions without Cost [36.27226683586425]
The semantic structure aware inference (SSA) is proposed to explore the semantic structure information hidden in different stages of the CNN-based network to generate high-quality CAM in the model inference.
The proposed method has the advantage of no parameters and does not need to be trained. Therefore, it can be applied to a wide range of weakly-supervised pixel-wise dense prediction tasks.
arXiv Detail & Related papers (2021-11-05T11:07:21Z) - Per-Pixel Classification is Not All You Need for Semantic Segmentation [184.2905747595058]
Mask classification is sufficiently general to solve both semantic- and instance-level segmentation tasks.
We propose MaskFormer, a simple mask classification model which predicts a set of binary masks.
Our method outperforms both current state-of-the-art semantic (55.6 mIoU on ADE20K) and panoptic segmentation (52.7 PQ on COCO) models.
arXiv Detail & Related papers (2021-07-13T17:59:50Z) - The Devil is in the Boundary: Exploiting Boundary Representation for
Basis-based Instance Segmentation [85.153426159438]
We propose Basis based Instance(B2Inst) to learn a global boundary representation that can complement existing global-mask-based methods.
Our B2Inst leads to consistent improvements and accurately parses out the instance boundaries in a scene.
arXiv Detail & Related papers (2020-11-26T11:26:06Z) - LevelSet R-CNN: A Deep Variational Method for Instance Segmentation [79.20048372891935]
Currently, many state of the art models are based on the Mask R-CNN framework.
We propose LevelSet R-CNN, which combines the best of both worlds by obtaining powerful feature representations.
We demonstrate the effectiveness of our approach on COCO and Cityscapes datasets.
arXiv Detail & Related papers (2020-07-30T17:52:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.