Weakly-supervised Semantic Segmentation via Dual-stream Contrastive Learning of Cross-image Contextual Information
- URL: http://arxiv.org/abs/2405.04913v1
- Date: Wed, 8 May 2024 09:35:26 GMT
- Title: Weakly-supervised Semantic Segmentation via Dual-stream Contrastive Learning of Cross-image Contextual Information
- Authors: Qi Lai, Chi-Man Vong,
- Abstract summary: Weakly supervised semantic segmentation (WSSS) aims at learning a semantic segmentation model with only image-level tags.
Most current WSSS methods focus on a limited single image (pixel-wise) information while ignoring the valuable inter-image (semantic-wise) information.
- Score: 10.77139542242678
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Weakly supervised semantic segmentation (WSSS) aims at learning a semantic segmentation model with only image-level tags. Despite intensive research on deep learning approaches over a decade, there is still a significant performance gap between WSSS and full semantic segmentation. Most current WSSS methods always focus on a limited single image (pixel-wise) information while ignoring the valuable inter-image (semantic-wise) information. From this perspective, a novel end-to-end WSSS framework called DSCNet is developed along with two innovations: i) pixel-wise group contrast and semantic-wise graph contrast are proposed and introduced into the WSSS framework; ii) a novel dual-stream contrastive learning (DSCL) mechanism is designed to jointly handle pixel-wise and semantic-wise context information for better WSSS performance. Specifically, the pixel-wise group contrast learning (PGCL) and semantic-wise graph contrast learning (SGCL) tasks form a more comprehensive solution. Extensive experiments on PASCAL VOC and MS COCO benchmarks verify the superiority of DSCNet over SOTA approaches and baseline models.
Related papers
- Dcl-Net: Dual Contrastive Learning Network for Semi-Supervised
Multi-Organ Segmentation [12.798684146496754]
We propose a two-stage Dual Contrastive Learning Network for semi-supervised MoS.
In Stage 1, we develop a similarity-guided global contrastive learning to explore the implicit continuity and similarity among images.
In Stage 2, we present an organ-aware local contrastive learning to further attract the class representations.
arXiv Detail & Related papers (2024-03-06T07:39:33Z) - Weakly-Supervised Semantic Segmentation with Image-Level Labels: from
Traditional Models to Foundation Models [33.690846523358836]
Weakly-supervised semantic segmentation (WSSS) is an effective solution to avoid pixel-level labels.
We focus on the WSSS with image-level labels, which is the most challenging form of WSSS.
We investigate the applicability of visual foundation models, such as the Segment Anything Model (SAM), in the context of WSSS.
arXiv Detail & Related papers (2023-10-19T07:16:54Z) - ISLE: A Framework for Image Level Semantic Segmentation Ensemble [5.137284292672375]
Conventional semantic segmentation networks require massive pixel-wise annotated labels to reach state-of-the-art prediction quality.
We propose ISLE, which employs an ensemble of the "pseudo-labels" for a given set of different semantic segmentation techniques on a class-wise level.
We reach up to 2.4% improvement over ISLE's individual components.
arXiv Detail & Related papers (2023-03-14T13:36:36Z) - In-N-Out Generative Learning for Dense Unsupervised Video Segmentation [89.21483504654282]
In this paper, we focus on the unsupervised Video Object (VOS) task which learns visual correspondence from unlabeled videos.
We propose the In-aNd-Out (INO) generative learning from a purely generative perspective, which captures both high-level and fine-grained semantics.
Our INO outperforms previous state-of-the-art methods by significant margins.
arXiv Detail & Related papers (2022-03-29T07:56:21Z) - Learning Self-Supervised Low-Rank Network for Single-Stage Weakly and
Semi-Supervised Semantic Segmentation [119.009033745244]
This paper presents a Self-supervised Low-Rank Network ( SLRNet) for single-stage weakly supervised semantic segmentation (WSSS) and semi-supervised semantic segmentation (SSSS)
SLRNet uses cross-view self-supervision, that is, it simultaneously predicts several attentive LR representations from different views of an image to learn precise pseudo-labels.
Experiments on the Pascal VOC 2012, COCO, and L2ID datasets demonstrate that our SLRNet outperforms both state-of-the-art WSSS and SSSS methods with a variety of different settings.
arXiv Detail & Related papers (2022-03-19T09:19:55Z) - Object discovery and representation networks [78.16003886427885]
We propose a self-supervised learning paradigm that discovers the structure encoded in priors by itself.
Our method, Odin, couples object discovery and representation networks to discover meaningful image segmentations without any supervision.
arXiv Detail & Related papers (2022-03-16T17:42:55Z) - MuSCLe: A Multi-Strategy Contrastive Learning Framework for Weakly
Supervised Semantic Segmentation [39.858844102571176]
Weakly supervised semantic segmentation (WSSS) relies on weak labels such as image level annotations rather than pixel level annotations required by supervised semantic segmentation (SSS) methods.
We propose a novel Multi-Strategy Contrastive Learning (MuSCLe) framework to obtain enhanced feature representations and improve WSSS performance.
arXiv Detail & Related papers (2022-01-18T14:38:50Z) - CRIS: CLIP-Driven Referring Image Segmentation [71.56466057776086]
We propose an end-to-end CLIP-Driven Referring Image framework (CRIS)
CRIS resorts to vision-language decoding and contrastive learning for achieving the text-to-pixel alignment.
Our proposed framework significantly outperforms the state-of-the-art performance without any post-processing.
arXiv Detail & Related papers (2021-11-30T07:29:08Z) - CTNet: Context-based Tandem Network for Semantic Segmentation [77.4337867789772]
This work proposes a novel Context-based Tandem Network (CTNet) by interactively exploring the spatial contextual information and the channel contextual information.
To further improve the performance of the learned representations for semantic segmentation, the results of the two context modules are adaptively integrated.
arXiv Detail & Related papers (2021-04-20T07:33:11Z) - Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation [49.90178055521207]
This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation.
We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths.
In particular, we devise a graph neural network (GNN) for group-wise semantic mining, wherein input images are represented as graph nodes.
arXiv Detail & Related papers (2020-12-09T12:40:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.