Self-supervised Equivariant Attention Mechanism for Weakly Supervised
Semantic Segmentation
- URL: http://arxiv.org/abs/2004.04581v1
- Date: Thu, 9 Apr 2020 14:57:57 GMT
- Title: Self-supervised Equivariant Attention Mechanism for Weakly Supervised
Semantic Segmentation
- Authors: Yude Wang, Jie Zhang, Meina Kan, Shiguang Shan, Xilin Chen
- Abstract summary: We propose a self-supervised equivariant attention mechanism (SEAM) to discover additional supervision and narrow the gap.
Our method is based on the observation that equivariance is an implicit constraint in fully supervised semantic segmentation.
We propose consistency regularization on predicted CAMs from various transformed images to provide self-supervision for network learning.
- Score: 93.83369981759996
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image-level weakly supervised semantic segmentation is a challenging problem
that has been deeply studied in recent years. Most of advanced solutions
exploit class activation map (CAM). However, CAMs can hardly serve as the
object mask due to the gap between full and weak supervisions. In this paper,
we propose a self-supervised equivariant attention mechanism (SEAM) to discover
additional supervision and narrow the gap. Our method is based on the
observation that equivariance is an implicit constraint in fully supervised
semantic segmentation, whose pixel-level labels take the same spatial
transformation as the input images during data augmentation. However, this
constraint is lost on the CAMs trained by image-level supervision. Therefore,
we propose consistency regularization on predicted CAMs from various
transformed images to provide self-supervision for network learning. Moreover,
we propose a pixel correlation module (PCM), which exploits context appearance
information and refines the prediction of current pixel by its similar
neighbors, leading to further improvement on CAMs consistency. Extensive
experiments on PASCAL VOC 2012 dataset demonstrate our method outperforms
state-of-the-art methods using the same level of supervision. The code is
released online.
Related papers
- Continual Panoptic Perception: Towards Multi-modal Incremental Interpretation of Remote Sensing Images [16.0258685984844]
Continual learning (CL) breaks off the one-way training manner and enables a model to adapt to new data, semantics and tasks continuously.
We propose a unified continual learning model that leverages multi-task joint learning covering pixel-level classification, instance-level segmentation and image-level perception.
arXiv Detail & Related papers (2024-07-19T12:22:32Z) - Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation [49.827306773992376]
Continual Test-Time Adaptation (CTTA) is proposed to migrate a source pre-trained model to continually changing target distributions.
Our proposed method attains state-of-the-art performance in both classification and segmentation CTTA tasks.
arXiv Detail & Related papers (2023-12-19T15:34:52Z) - Progressive Feature Self-reinforcement for Weakly Supervised Semantic
Segmentation [55.69128107473125]
We propose a single-stage approach for Weakly Supervised Semantic (WSSS) with image-level labels.
We adaptively partition the image content into deterministic regions (e.g., confident foreground and background) and uncertain regions (e.g., object boundaries and misclassified categories) for separate processing.
Building upon this, we introduce a complementary self-enhancement method that constrains the semantic consistency between these confident regions and an augmented image with the same class labels.
arXiv Detail & Related papers (2023-12-14T13:21:52Z) - Boosting Weakly-Supervised Image Segmentation via Representation,
Transform, and Compensator [26.991314511807907]
Multi-stage training procedures have been widely used in existing WSIS approaches to obtain high-quality pseudo-masks as ground-truth.
We propose a novel single-stage WSIS method that utilizes a siamese network with contrastive learning to improve the quality of class activation maps (CAMs) and achieve a self-refinement process.
Our method significantly outperforms other state-of-the-art methods, achieving 67.2% and 68.76% mIoU on PASCAL VOC 2012 dataset.
arXiv Detail & Related papers (2023-09-02T09:07:25Z) - Learning to Mask and Permute Visual Tokens for Vision Transformer
Pre-Training [59.923672191632065]
We propose a new self-supervised pre-training approach, named Masked and Permuted Vision Transformer (MaPeT)
MaPeT employs autoregressive and permuted predictions to capture intra-patch dependencies.
Our results demonstrate that MaPeT achieves competitive performance on ImageNet.
arXiv Detail & Related papers (2023-06-12T18:12:19Z) - Location-Aware Self-Supervised Transformers [74.76585889813207]
We propose to pretrain networks for semantic segmentation by predicting the relative location of image parts.
We control the difficulty of the task by masking a subset of the reference patch features visible to those of the query.
Our experiments show that this location-aware pretraining leads to representations that transfer competitively to several challenging semantic segmentation benchmarks.
arXiv Detail & Related papers (2022-12-05T16:24:29Z) - Activation Modulation and Recalibration Scheme for Weakly Supervised
Semantic Segmentation [24.08326440298189]
We propose a novel activation modulation and recalibration scheme for weakly supervised semantic segmentation.
We show that AMR establishes a new state-of-the-art performance on the PASCAL VOC 2012 dataset.
Experiments also reveal that our scheme is plug-and-play and can be incorporated with other approaches to boost their performance.
arXiv Detail & Related papers (2021-12-16T16:26:14Z) - Railroad is not a Train: Saliency as Pseudo-pixel Supervision for Weakly
Supervised Semantic Segmentation [16.560870740946275]
Explicit Pseudo-pixel Supervision (EPS) learns from pixel-level feedback by combining two weak supervisions.
We devise a joint training strategy to fully utilize the complementary relationship between both information.
Our method can obtain accurate object boundaries and discard co-occurring pixels, thereby significantly improving the quality of pseudo-masks.
arXiv Detail & Related papers (2021-05-19T07:31:11Z) - Weakly supervised segmentation with cross-modality equivariant
constraints [7.757293476741071]
Weakly supervised learning has emerged as an appealing alternative to alleviate the need for large labeled datasets in semantic segmentation.
We present a novel learning strategy that leverages self-supervision in a multi-modal image scenario to significantly enhance original CAMs.
Our approach outperforms relevant recent literature under the same learning conditions.
arXiv Detail & Related papers (2021-04-06T13:14:20Z) - Self-supervised Human Detection and Segmentation via Multi-view
Consensus [116.92405645348185]
We propose a multi-camera framework in which geometric constraints are embedded in the form of multi-view consistency during training.
We show that our approach outperforms state-of-the-art self-supervised person detection and segmentation techniques on images that visually depart from those of standard benchmarks.
arXiv Detail & Related papers (2020-12-09T15:47:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.