A Self-Training Framework Based on Multi-Scale Attention Fusion for
Weakly Supervised Semantic Segmentation
- URL: http://arxiv.org/abs/2305.05841v1
- Date: Wed, 10 May 2023 02:16:12 GMT
- Title: A Self-Training Framework Based on Multi-Scale Attention Fusion for
Weakly Supervised Semantic Segmentation
- Authors: Guoqing Yang, Chuang Zhu, Yu Zhang
- Abstract summary: We propose a self-training method that utilizes fused multi-scale class-aware attention maps.
We collect information from attention maps of different scales and obtain multi-scale attention maps.
We then apply denoising and reactivation strategies to enhance the potential regions and reduce noisy areas.
- Score: 7.36778096476552
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Weakly supervised semantic segmentation (WSSS) based on image-level labels is
challenging since it is hard to obtain complete semantic regions. To address
this issue, we propose a self-training method that utilizes fused multi-scale
class-aware attention maps. Our observation is that attention maps of different
scales contain rich complementary information, especially for large and small
objects. Therefore, we collect information from attention maps of different
scales and obtain multi-scale attention maps. We then apply denoising and
reactivation strategies to enhance the potential regions and reduce noisy
areas. Finally, we use the refined attention maps to retrain the network.
Experiments showthat our method enables the model to extract rich semantic
information from multi-scale images and achieves 72.4% mIou scores on both the
PASCAL VOC 2012 validation and test sets. The code is available at
https://bupt-ai-cz.github.io/SMAF.
Related papers
- iSeg: An Iterative Refinement-based Framework for Training-free Segmentation [85.58324416386375]
We present a deep experimental analysis on iteratively refining cross-attention map with self-attention map.
We propose an effective iterative refinement framework for training-free segmentation, named iSeg.
Our proposed iSeg achieves an absolute gain of 3.8% in terms of mIoU compared to the best existing training-free approach in literature.
arXiv Detail & Related papers (2024-09-05T03:07:26Z) - Auxiliary Tasks Enhanced Dual-affinity Learning for Weakly Supervised
Semantic Segmentation [79.05949524349005]
We propose AuxSegNet+, a weakly supervised auxiliary learning framework to explore the rich information from saliency maps.
We also propose a cross-task affinity learning mechanism to learn pixel-level affinities from the saliency and segmentation feature maps.
arXiv Detail & Related papers (2024-03-02T10:03:21Z) - SLiMe: Segment Like Me [24.254744102347413]
We propose SLiMe to segment images at any desired granularity using as few as one annotated sample.
We carried out a knowledge-rich set of experiments examining various design factors and showed that SLiMe outperforms other existing one-shot and few-shot segmentation methods.
arXiv Detail & Related papers (2023-09-06T17:39:05Z) - Self-attention on Multi-Shifted Windows for Scene Segmentation [14.47974086177051]
We explore the effective use of self-attention within multi-scale image windows to learn descriptive visual features.
We propose three different strategies to aggregate these feature maps to decode the feature representation for dense prediction.
Our models achieve very promising performance on four public scene segmentation datasets.
arXiv Detail & Related papers (2022-07-10T07:36:36Z) - L2G: A Simple Local-to-Global Knowledge Transfer Framework for Weakly
Supervised Semantic Segmentation [67.26984058377435]
We present L2G, a simple online local-to-global knowledge transfer framework for high-quality object attention mining.
Our framework conducts the global network to learn the captured rich object detail knowledge from a global view.
Experiments show that our method attains 72.1% and 44.2% mIoU scores on the validation set of PASCAL VOC 2012 and MS COCO 2014.
arXiv Detail & Related papers (2022-04-07T04:31:32Z) - LEAD: Self-Supervised Landmark Estimation by Aligning Distributions of
Feature Similarity [49.84167231111667]
Existing works in self-supervised landmark detection are based on learning dense (pixel-level) feature representations from an image.
We introduce an approach to enhance the learning of dense equivariant representations in a self-supervised fashion.
We show that having such a prior in the feature extractor helps in landmark detection, even under drastically limited number of annotations.
arXiv Detail & Related papers (2022-04-06T17:48:18Z) - Discriminative Region-based Multi-Label Zero-Shot Learning [145.0952336375342]
Multi-label zero-shot learning (ZSL) is a more realistic counter-part of standard single-label ZSL.
We propose an alternate approach towards region-based discriminability-preserving ZSL.
arXiv Detail & Related papers (2021-08-20T17:56:47Z) - Segmentation of VHR EO Images using Unsupervised Learning [19.00071868539993]
We propose an unsupervised semantic segmentation method that can be trained using just a single unlabeled scene.
The proposed method exploits this property to sample smaller patches from the larger scene.
After unsupervised training on the target image/scene, the model automatically segregates the major classes present in the scene and produces the segmentation map.
arXiv Detail & Related papers (2021-07-09T11:42:48Z) - Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation [128.03739769844736]
Two neural co-attentions are incorporated into the classifier to capture cross-image semantic similarities and differences.
In addition to boosting object pattern learning, the co-attention can leverage context from other related images to improve localization map inference.
Our algorithm sets new state-of-the-arts on all these settings, demonstrating well its efficacy and generalizability.
arXiv Detail & Related papers (2020-07-03T21:53:46Z) - Learning to Segment from Scribbles using Multi-scale Adversarial
Attention Gates [16.28285034098361]
Weakly-supervised learning can train models by relying on weaker forms of annotation, such as scribbles.
We train a multi-scale GAN to generate realistic segmentation masks at multiple resolutions, while we use scribbles to learn their correct position in the image.
Central to the model's success is a novel attention gating mechanism, which we condition with adversarial signals to act as a shape prior.
arXiv Detail & Related papers (2020-07-02T14:39:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.