Multi-Granularity Denoising and Bidirectional Alignment for Weakly
Supervised Semantic Segmentation
- URL: http://arxiv.org/abs/2305.05154v1
- Date: Tue, 9 May 2023 03:33:43 GMT
- Title: Multi-Granularity Denoising and Bidirectional Alignment for Weakly
Supervised Semantic Segmentation
- Authors: Tao Chen, Yazhou Yao and Jinhui Tang
- Abstract summary: We propose an end-to-end multi-granularity denoising and bidirectional alignment (MDBA) model to alleviate the noisy label and multi-class generalization issues.
The MDBA model can reach the mIoU of 69.5% and 70.2% on validation and test sets for the PASCAL VOC 2012 dataset.
- Score: 75.32213865436442
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Weakly supervised semantic segmentation (WSSS) models relying on class
activation maps (CAMs) have achieved desirable performance comparing to the
non-CAMs-based counterparts. However, to guarantee WSSS task feasible, we need
to generate pseudo labels by expanding the seeds from CAMs which is complex and
time-consuming, thus hindering the design of efficient end-to-end
(single-stage) WSSS approaches. To tackle the above dilemma, we resort to the
off-the-shelf and readily accessible saliency maps for directly obtaining
pseudo labels given the image-level class labels. Nevertheless, the salient
regions may contain noisy labels and cannot seamlessly fit the target objects,
and saliency maps can only be approximated as pseudo labels for simple images
containing single-class objects. As such, the achieved segmentation model with
these simple images cannot generalize well to the complex images containing
multi-class objects. To this end, we propose an end-to-end multi-granularity
denoising and bidirectional alignment (MDBA) model, to alleviate the noisy
label and multi-class generalization issues. Specifically, we propose the
online noise filtering and progressive noise detection modules to tackle
image-level and pixel-level noise, respectively. Moreover, a bidirectional
alignment mechanism is proposed to reduce the data distribution gap at both
input and output space with simple-to-complex image synthesis and
complex-to-simple adversarial learning. MDBA can reach the mIoU of 69.5\% and
70.2\% on validation and test sets for the PASCAL VOC 2012 dataset. The source
codes and models have been made available at
\url{https://github.com/NUST-Machine-Intelligence-Laboratory/MDBA}.
Related papers
- Enhancing Weakly Supervised Semantic Segmentation with Multi-modal Foundation Models: An End-to-End Approach [7.012760526318993]
Weakly-Supervised Semantic (WSSS) offers a cost-efficient workaround to extensive labeling.
Existing WSSS methods have difficulties in learning the boundaries of objects leading to poor segmentation results.
We propose a novel and effective framework that addresses these issues by leveraging visual foundation models inside the bounding box.
arXiv Detail & Related papers (2024-05-10T16:42:25Z) - Auxiliary Tasks Enhanced Dual-affinity Learning for Weakly Supervised
Semantic Segmentation [79.05949524349005]
We propose AuxSegNet+, a weakly supervised auxiliary learning framework to explore the rich information from saliency maps.
We also propose a cross-task affinity learning mechanism to learn pixel-level affinities from the saliency and segmentation feature maps.
arXiv Detail & Related papers (2024-03-02T10:03:21Z) - Segment Anything Model (SAM) Enhanced Pseudo Labels for Weakly
Supervised Semantic Segmentation [30.812323329239614]
Weakly supervised semantic segmentation (WSSS) aims to bypass the need for laborious pixel-level annotation by using only image-level annotation.
Most existing methods rely on Class Activation Maps (CAM) to derive pixel-level pseudo-labels.
We introduce a simple yet effective method harnessing the Segment Anything Model (SAM), a class-agnostic foundation model capable of producing fine-grained instance masks of objects, parts, and subparts.
arXiv Detail & Related papers (2023-05-09T23:24:09Z) - Boosting Few-shot Fine-grained Recognition with Background Suppression
and Foreground Alignment [53.401889855278704]
Few-shot fine-grained recognition (FS-FGR) aims to recognize novel fine-grained categories with the help of limited available samples.
We propose a two-stage background suppression and foreground alignment framework, which is composed of a background activation suppression (BAS) module, a foreground object alignment (FOA) module, and a local to local (L2L) similarity metric.
Experiments conducted on multiple popular fine-grained benchmarks demonstrate that our method outperforms the existing state-of-the-art by a large margin.
arXiv Detail & Related papers (2022-10-04T07:54:40Z) - Saliency Guided Inter- and Intra-Class Relation Constraints for Weakly
Supervised Semantic Segmentation [66.87777732230884]
We propose a saliency guided Inter- and Intra-Class Relation Constrained (I$2$CRC) framework to assist the expansion of the activated object regions.
We also introduce an object guided label refinement module to take a full use of both the segmentation prediction and the initial labels for obtaining superior pseudo-labels.
arXiv Detail & Related papers (2022-06-20T03:40:56Z) - One Weird Trick to Improve Your Semi-Weakly Supervised Semantic
Segmentation Model [8.388356030608886]
Semi-weakly supervised semantic segmentation (SWSSS) aims to train a model to identify objects in images based on a small number of images with pixel-level labels, and many more images with only image-level labels.
Most existing SWSSS algorithms extract pixel-level pseudo-labels from an image classifier - a very difficult task to do well.
We propose a method called prediction filtering, which instead of extracting pseudo-labels, just uses the classifier as a classifier.
Adding this simple post-processing method to baselines gives results competitive with or better than prior SWSSS algorithms.
arXiv Detail & Related papers (2022-05-02T21:46:41Z) - Learning Self-Supervised Low-Rank Network for Single-Stage Weakly and
Semi-Supervised Semantic Segmentation [119.009033745244]
This paper presents a Self-supervised Low-Rank Network ( SLRNet) for single-stage weakly supervised semantic segmentation (WSSS) and semi-supervised semantic segmentation (SSSS)
SLRNet uses cross-view self-supervision, that is, it simultaneously predicts several attentive LR representations from different views of an image to learn precise pseudo-labels.
Experiments on the Pascal VOC 2012, COCO, and L2ID datasets demonstrate that our SLRNet outperforms both state-of-the-art WSSS and SSSS methods with a variety of different settings.
arXiv Detail & Related papers (2022-03-19T09:19:55Z) - Semi-Supervised Domain Adaptation with Prototypical Alignment and
Consistency Learning [86.6929930921905]
This paper studies how much it can help address domain shifts if we further have a few target samples labeled.
To explore the full potential of landmarks, we incorporate a prototypical alignment (PA) module which calculates a target prototype for each class from the landmarks.
Specifically, we severely perturb the labeled images, making PA non-trivial to achieve and thus promoting model generalizability.
arXiv Detail & Related papers (2021-04-19T08:46:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.