Boosting Weakly-Supervised Image Segmentation via Representation,
Transform, and Compensator
- URL: http://arxiv.org/abs/2309.00871v1
- Date: Sat, 2 Sep 2023 09:07:25 GMT
- Title: Boosting Weakly-Supervised Image Segmentation via Representation,
Transform, and Compensator
- Authors: Chunyan Wang, Dong Zhang, Rui Yan
- Abstract summary: Multi-stage training procedures have been widely used in existing WSIS approaches to obtain high-quality pseudo-masks as ground-truth.
We propose a novel single-stage WSIS method that utilizes a siamese network with contrastive learning to improve the quality of class activation maps (CAMs) and achieve a self-refinement process.
Our method significantly outperforms other state-of-the-art methods, achieving 67.2% and 68.76% mIoU on PASCAL VOC 2012 dataset.
- Score: 26.991314511807907
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Weakly-supervised image segmentation (WSIS) is a critical task in computer
vision that relies on image-level class labels. Multi-stage training procedures
have been widely used in existing WSIS approaches to obtain high-quality
pseudo-masks as ground-truth, resulting in significant progress. However,
single-stage WSIS methods have recently gained attention due to their potential
for simplifying training procedures, despite often suffering from low-quality
pseudo-masks that limit their practical applications. To address this issue, we
propose a novel single-stage WSIS method that utilizes a siamese network with
contrastive learning to improve the quality of class activation maps (CAMs) and
achieve a self-refinement process. Our approach employs a cross-representation
refinement method that expands reliable object regions by utilizing different
feature representations from the backbone. Additionally, we introduce a
cross-transform regularization module that learns robust class prototypes for
contrastive learning and captures global context information to feed back rough
CAMs, thereby improving the quality of CAMs. Our final high-quality CAMs are
used as pseudo-masks to supervise the segmentation result. Experimental results
on the PASCAL VOC 2012 dataset demonstrate that our method significantly
outperforms other state-of-the-art methods, achieving 67.2% and 68.76% mIoU on
PASCAL VOC 2012 val set and test set, respectively. Furthermore, our method has
been extended to weakly supervised object localization task, and experimental
results demonstrate that our method continues to achieve very competitive
results.
Related papers
- Semantic Refocused Tuning for Open-Vocabulary Panoptic Segmentation [42.020470627552136]
Open-vocabulary panoptic segmentation is an emerging task aiming to accurately segment the image into semantically meaningful masks.
mask classification is the main performance bottleneck for open-vocab panoptic segmentation.
We propose Semantic Refocused Tuning, a novel framework that greatly enhances open-vocab panoptic segmentation.
arXiv Detail & Related papers (2024-09-24T17:50:28Z) - Incremental Object Detection with CLIP [36.478530086163744]
We propose a visual-language model such as CLIP to generate text feature embeddings for different class sets.
We then employ super-classes to replace the unavailable novel classes in the early learning stage to simulate the incremental scenario.
We incorporate the finely recognized detection boxes as pseudo-annotations into the training process, thereby further improving the detection performance.
arXiv Detail & Related papers (2023-10-13T01:59:39Z) - Localized Region Contrast for Enhancing Self-Supervised Learning in
Medical Image Segmentation [27.82940072548603]
We propose a novel contrastive learning framework that integrates Localized Region Contrast (LRC) to enhance existing self-supervised pre-training methods for medical image segmentation.
Our approach involves identifying Super-pixels by Felzenszwalb's algorithm and performing local contrastive learning using a novel contrastive sampling loss.
arXiv Detail & Related papers (2023-04-06T22:43:13Z) - Multi-Level Contrastive Learning for Dense Prediction Task [59.591755258395594]
We present Multi-Level Contrastive Learning for Dense Prediction Task (MCL), an efficient self-supervised method for learning region-level feature representation for dense prediction tasks.
Our method is motivated by the three key factors in detection: localization, scale consistency and recognition.
Our method consistently outperforms the recent state-of-the-art methods on various datasets with significant margins.
arXiv Detail & Related papers (2023-04-04T17:59:04Z) - Weighted Ensemble Self-Supervised Learning [67.24482854208783]
Ensembling has proven to be a powerful technique for boosting model performance.
We develop a framework that permits data-dependent weighted cross-entropy losses.
Our method outperforms both in multiple evaluation metrics on ImageNet-1K.
arXiv Detail & Related papers (2022-11-18T02:00:17Z) - Towards Effective Image Manipulation Detection with Proposal Contrastive
Learning [61.5469708038966]
We propose Proposal Contrastive Learning (PCL) for effective image manipulation detection.
Our PCL consists of a two-stream architecture by extracting two types of global features from RGB and noise views respectively.
Our PCL can be easily adapted to unlabeled data in practice, which can reduce manual labeling costs and promote more generalizable features.
arXiv Detail & Related papers (2022-10-16T13:30:13Z) - Activation to Saliency: Forming High-Quality Labels for Unsupervised
Salient Object Detection [54.92703325989853]
We propose a two-stage Activation-to-Saliency (A2S) framework that effectively generates high-quality saliency cues.
No human annotations are involved in our framework during the whole training process.
Our framework reports significant performance compared with existing USOD methods.
arXiv Detail & Related papers (2021-12-07T11:54:06Z) - A Practical Contrastive Learning Framework for Single-Image
Super-Resolution [51.422185656787285]
We investigate contrastive learning-based single image super-resolution from two perspectives.
We propose a practical contrastive learning framework for SISR, named PCL-SR.
Compared with existing benchmark methods, we re-train them by our proposed PCL-SR framework and achieve superior performance.
arXiv Detail & Related papers (2021-11-27T15:42:12Z) - Spatially Consistent Representation Learning [12.120041613482558]
We propose a spatially consistent representation learning algorithm (SCRL) for multi-object and location-specific tasks.
We devise a novel self-supervised objective that tries to produce coherent spatial representations of a randomly cropped local region.
On various downstream localization tasks with benchmark datasets, the proposed SCRL shows significant performance improvements.
arXiv Detail & Related papers (2021-03-10T15:23:45Z) - Self-supervised Equivariant Attention Mechanism for Weakly Supervised
Semantic Segmentation [93.83369981759996]
We propose a self-supervised equivariant attention mechanism (SEAM) to discover additional supervision and narrow the gap.
Our method is based on the observation that equivariance is an implicit constraint in fully supervised semantic segmentation.
We propose consistency regularization on predicted CAMs from various transformed images to provide self-supervision for network learning.
arXiv Detail & Related papers (2020-04-09T14:57:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.