Learning structure-aware semantic segmentation with image-level
supervision
- URL: http://arxiv.org/abs/2104.07216v1
- Date: Thu, 15 Apr 2021 03:33:20 GMT
- Title: Learning structure-aware semantic segmentation with image-level
supervision
- Authors: Jiawei Liu, Jing Zhang, Yicong Hong, Nick Barnes
- Abstract summary: We argue that the lost structure information in CAM limits its application in downstream semantic segmentation.
We introduce an auxiliary semantic boundary detection module, which penalizes the deteriorated predictions.
Experimental results on the PASCAL-VOC dataset illustrate the effectiveness of the proposed solution.
- Score: 36.40302533324508
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Compared with expensive pixel-wise annotations, image-level labels make it
possible to learn semantic segmentation in a weakly-supervised manner. Within
this pipeline, the class activation map (CAM) is obtained and further processed
to serve as a pseudo label to train the semantic segmentation model in a
fully-supervised manner. In this paper, we argue that the lost structure
information in CAM limits its application in downstream semantic segmentation,
leading to deteriorated predictions. Furthermore, the inconsistent class
activation scores inside the same object contradicts the common sense that each
region of the same object should belong to the same semantic category. To
produce sharp prediction with structure information, we introduce an auxiliary
semantic boundary detection module, which penalizes the deteriorated
predictions. Furthermore, we adopt smoothness loss to encourage prediction
inside the object to be consistent. Experimental results on the PASCAL-VOC
dataset illustrate the effectiveness of the proposed solution.
Related papers
- EAGLE: Eigen Aggregation Learning for Object-Centric Unsupervised Semantic Segmentation [5.476136494434766]
We introduce EiCue, a technique providing semantic and structural cues through an eigenbasis derived from semantic similarity matrix.
We guide our model to learn object-level representations with intra- and inter-image object-feature consistency.
Experiments on COCO-Stuff, Cityscapes, and Potsdam-3 datasets demonstrate the state-of-the-art USS results.
arXiv Detail & Related papers (2024-03-03T11:24:16Z) - Semantics Meets Temporal Correspondence: Self-supervised Object-centric Learning in Videos [63.94040814459116]
Self-supervised methods have shown remarkable progress in learning high-level semantics and low-level temporal correspondence.
We propose a novel semantic-aware masked slot attention on top of the fused semantic features and correspondence maps.
We adopt semantic- and instance-level temporal consistency as self-supervision to encourage temporally coherent object-centric representations.
arXiv Detail & Related papers (2023-08-19T09:12:13Z) - Decom--CAM: Tell Me What You See, In Details! Feature-Level Interpretation via Decomposition Class Activation Map [23.71680014689873]
Class Activation Map (CAM) is widely used to interpret deep model predictions by highlighting object location.
This paper proposes a new two-stage interpretability method called the Decomposition Class Activation Map (Decom-CAM)
Our experiments demonstrate that the proposed Decom-CAM outperforms current state-of-the-art methods significantly.
arXiv Detail & Related papers (2023-05-27T14:33:01Z) - Location-Aware Self-Supervised Transformers [74.76585889813207]
We propose to pretrain networks for semantic segmentation by predicting the relative location of image parts.
We control the difficulty of the task by masking a subset of the reference patch features visible to those of the query.
Our experiments show that this location-aware pretraining leads to representations that transfer competitively to several challenging semantic segmentation benchmarks.
arXiv Detail & Related papers (2022-12-05T16:24:29Z) - SSA: Semantic Structure Aware Inference for Weakly Pixel-Wise Dense
Predictions without Cost [36.27226683586425]
The semantic structure aware inference (SSA) is proposed to explore the semantic structure information hidden in different stages of the CNN-based network to generate high-quality CAM in the model inference.
The proposed method has the advantage of no parameters and does not need to be trained. Therefore, it can be applied to a wide range of weakly-supervised pixel-wise dense prediction tasks.
arXiv Detail & Related papers (2021-11-05T11:07:21Z) - Mixup-CAM: Weakly-supervised Semantic Segmentation via Uncertainty
Regularization [73.03956876752868]
We propose a principled and end-to-end train-able framework to allow the network to pay attention to other parts of the object.
Specifically, we introduce the mixup data augmentation scheme into the classification network and design two uncertainty regularization terms to better interact with the mixup strategy.
arXiv Detail & Related papers (2020-08-03T21:19:08Z) - Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation [128.03739769844736]
Two neural co-attentions are incorporated into the classifier to capture cross-image semantic similarities and differences.
In addition to boosting object pattern learning, the co-attention can leverage context from other related images to improve localization map inference.
Our algorithm sets new state-of-the-arts on all these settings, demonstrating well its efficacy and generalizability.
arXiv Detail & Related papers (2020-07-03T21:53:46Z) - Semantics-Driven Unsupervised Learning for Monocular Depth and
Ego-Motion Estimation [33.83396613039467]
We propose a semantics-driven unsupervised learning approach for monocular depth and ego-motion estimation from videos.
Recent unsupervised learning methods employ photometric errors between synthetic view and actual image as a supervision signal for training.
arXiv Detail & Related papers (2020-06-08T05:55:07Z) - Self-Supervised Tuning for Few-Shot Segmentation [82.32143982269892]
Few-shot segmentation aims at assigning a category label to each image pixel with few annotated samples.
Existing meta-learning method tends to fail in generating category-specifically discriminative descriptor when the visual features extracted from support images are marginalized in embedding space.
This paper presents an adaptive framework tuning, in which the distribution of latent features across different episodes is dynamically adjusted based on a self-segmentation scheme.
arXiv Detail & Related papers (2020-04-12T03:53:53Z) - Self-supervised Equivariant Attention Mechanism for Weakly Supervised
Semantic Segmentation [93.83369981759996]
We propose a self-supervised equivariant attention mechanism (SEAM) to discover additional supervision and narrow the gap.
Our method is based on the observation that equivariance is an implicit constraint in fully supervised semantic segmentation.
We propose consistency regularization on predicted CAMs from various transformed images to provide self-supervision for network learning.
arXiv Detail & Related papers (2020-04-09T14:57:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.