Self-supervised Image-specific Prototype Exploration for Weakly
Supervised Semantic Segmentation
- URL: http://arxiv.org/abs/2203.02909v1
- Date: Sun, 6 Mar 2022 09:01:03 GMT
- Title: Self-supervised Image-specific Prototype Exploration for Weakly
Supervised Semantic Segmentation
- Authors: Qi Chen, Lingxiao Yang, Jianhuang Lai, Xiaohua Xie
- Abstract summary: Weakly Supervised Semantic COCO (WSSS) based on image-level labels has attracted much attention due to low annotation costs.
We propose a Self-supervised Image-specific Prototype Exploration (SIPE) that consists of an Image-specific Prototype Exploration (IPE) and a General-Specific Consistency (GSC) loss.
Our SIPE achieves new state-of-the-art performance using only image-level labels.
- Score: 72.33139350241044
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Weakly Supervised Semantic Segmentation (WSSS) based on image-level labels
has attracted much attention due to low annotation costs. Existing methods
often rely on Class Activation Mapping (CAM) that measures the correlation
between image pixels and classifier weight. However, the classifier focuses
only on the discriminative regions while ignoring other useful information in
each image, resulting in incomplete localization maps. To address this issue,
we propose a Self-supervised Image-specific Prototype Exploration (SIPE) that
consists of an Image-specific Prototype Exploration (IPE) and a
General-Specific Consistency (GSC) loss. Specifically, IPE tailors prototypes
for every image to capture complete regions, formed our Image-Specific CAM
(IS-CAM), which is realized by two sequential steps. In addition, GSC is
proposed to construct the consistency of general CAM and our specific IS-CAM,
which further optimizes the feature representation and empowers a
self-correction ability of prototype exploration. Extensive experiments are
conducted on PASCAL VOC 2012 and MS COCO 2014 segmentation benchmark and
results show our SIPE achieves new state-of-the-art performance using only
image-level labels. The code is available at
https://github.com/chenqi1126/SIPE.
Related papers
- Spatial Action Unit Cues for Interpretable Deep Facial Expression Recognition [55.97779732051921]
State-of-the-art classifiers for facial expression recognition (FER) lack interpretability, an important feature for end-users.
A new learning strategy is proposed to explicitly incorporate AU cues into classifier training, allowing to train deep interpretable models.
Our new strategy is generic, and can be applied to any deep CNN- or transformer-based classifier without requiring any architectural change or significant additional training time.
arXiv Detail & Related papers (2024-10-01T10:42:55Z) - Auxiliary Tasks Enhanced Dual-affinity Learning for Weakly Supervised
Semantic Segmentation [79.05949524349005]
We propose AuxSegNet+, a weakly supervised auxiliary learning framework to explore the rich information from saliency maps.
We also propose a cross-task affinity learning mechanism to learn pixel-level affinities from the saliency and segmentation feature maps.
arXiv Detail & Related papers (2024-03-02T10:03:21Z) - COMNet: Co-Occurrent Matching for Weakly Supervised Semantic
Segmentation [13.244183864948848]
We propose a novel Co-Occurrent Matching Network (COMNet), which can promote the quality of the CAMs and enforce the network to pay attention to the entire parts of objects.
Specifically, we perform inter-matching on paired images that contain common classes to enhance the corresponded areas, and construct intra-matching on a single image to propagate the semantic features across the object regions.
The experiments on the Pascal VOC 2012 and MS-COCO datasets show that our network can effectively boost the performance of the baseline model and achieve new state-of-the-art performance.
arXiv Detail & Related papers (2023-09-29T03:55:24Z) - Saliency Guided Inter- and Intra-Class Relation Constraints for Weakly
Supervised Semantic Segmentation [66.87777732230884]
We propose a saliency guided Inter- and Intra-Class Relation Constrained (I$2$CRC) framework to assist the expansion of the activated object regions.
We also introduce an object guided label refinement module to take a full use of both the segmentation prediction and the initial labels for obtaining superior pseudo-labels.
arXiv Detail & Related papers (2022-06-20T03:40:56Z) - Boosting Few-shot Semantic Segmentation with Transformers [81.43459055197435]
TRansformer-based Few-shot Semantic segmentation method (TRFS)
Our model consists of two modules: Global Enhancement Module (GEM) and Local Enhancement Module (LEM)
arXiv Detail & Related papers (2021-08-04T20:09:21Z) - SCNet: Enhancing Few-Shot Semantic Segmentation by Self-Contrastive
Background Prototypes [56.387647750094466]
Few-shot semantic segmentation aims to segment novel-class objects in a query image with only a few annotated examples.
Most of advanced solutions exploit a metric learning framework that performs segmentation through matching each pixel to a learned foreground prototype.
This framework suffers from biased classification due to incomplete construction of sample pairs with the foreground prototype only.
arXiv Detail & Related papers (2021-04-19T11:21:47Z) - Puzzle-CAM: Improved localization via matching partial and full features [0.5482532589225552]
Weakly-supervised semantic segmentation (WSSS) is introduced to narrow the gap for semantic segmentation performance from pixel-level supervision to image-level supervision.
Most advanced approaches are based on class activation maps (CAMs) to generate pseudo-labels to train the segmentation network.
We propose Puzzle-CAM, a process that minimizes differences between the features from separate patches and the whole image.
In experiments, Puzzle-CAM outperformed previous state-of-the-art methods using the same labels for supervision on the PASCAL VOC 2012 dataset.
arXiv Detail & Related papers (2021-01-27T08:19:38Z) - Deep Active Learning for Joint Classification & Segmentation with Weak
Annotator [22.271760669551817]
CNN visualization and interpretation methods, like class-activation maps (CAMs), are typically used to highlight the image regions linked to class predictions.
We propose an active learning framework, which progressively integrates pixel-level annotations during training.
Our results indicate that, by simply using random sample selection, the proposed approach can significantly outperform state-of-the-art CAMs and AL methods.
arXiv Detail & Related papers (2020-10-10T03:25:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.