A Language-Guided Benchmark for Weakly Supervised Open Vocabulary
Semantic Segmentation
- URL: http://arxiv.org/abs/2302.14163v1
- Date: Mon, 27 Feb 2023 21:55:48 GMT
- Title: A Language-Guided Benchmark for Weakly Supervised Open Vocabulary
Semantic Segmentation
- Authors: Prashant Pandey, Mustafa Chasmai, Monish Natarajan, Brejesh Lall
- Abstract summary: We propose a novel weakly supervised OVSS pipeline that can perform ZSS, FSS and Cross-dataset segmentation on novel classes.
The proposed pipeline beats existing methods for weak generalized Zero-Shot and weak Few-Shot semantic segmentation by 39 and 3 mIOU points respectively.
- Score: 10.054960979867584
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Increasing attention is being diverted to data-efficient problem settings
like Open Vocabulary Semantic Segmentation (OVSS) which deals with segmenting
an arbitrary object that may or may not be seen during training. The closest
standard problems related to OVSS are Zero-Shot and Few-Shot Segmentation (ZSS,
FSS) and their Cross-dataset variants where zero to few annotations are needed
to segment novel classes. The existing FSS and ZSS methods utilize fully
supervised pixel-labelled seen classes to segment unseen classes. Pixel-level
labels are hard to obtain, and using weak supervision in the form of
inexpensive image-level labels is often more practical. To this end, we propose
a novel unified weakly supervised OVSS pipeline that can perform ZSS, FSS and
Cross-dataset segmentation on novel classes without using pixel-level labels
for either the base (seen) or the novel (unseen) classes in an inductive
setting. We propose Weakly-Supervised Language-Guided Segmentation Network
(WLSegNet), a novel language-guided segmentation pipeline that i) learns
generalizable context vectors with batch aggregates (mean) to map class prompts
to image features using frozen CLIP (a vision-language model) and ii) decouples
weak ZSS/FSS into weak semantic segmentation and Zero-Shot segmentation. The
learned context vectors avoid overfitting on seen classes during training and
transfer better to novel classes during testing. WLSegNet avoids fine-tuning
and the use of external datasets during training. The proposed pipeline beats
existing methods for weak generalized Zero-Shot and weak Few-Shot semantic
segmentation by 39 and 3 mIOU points respectively on PASCAL VOC and weak
Few-Shot semantic segmentation by 5 mIOU points on MS COCO. On a harder setting
of 2-way 1-shot weak FSS, WLSegNet beats the baselines by 13 and 22 mIOU points
on PASCAL VOC and MS COCO, respectively.
Related papers
- Generalized Category Discovery in Semantic Segmentation [43.99230778597973]
This paper explores a novel setting called Generalized Category Discovery in Semantic (GCDSS)
GCDSS aims to segment unlabeled images given prior knowledge from a labeled set of base classes.
In contrast to Novel Category Discovery in Semantic (NCDSS), there is no prerequisite for prior knowledge mandating the existence of at least one novel class in each unlabeled image.
arXiv Detail & Related papers (2023-11-20T04:11:16Z) - Visual and Textual Prior Guided Mask Assemble for Few-Shot Segmentation
and Beyond [0.0]
We propose a visual and textual Prior Guided Mask Assemble Network (PGMA-Net)
It employs a class-agnostic mask assembly process to alleviate the bias, and formulates diverse tasks into a unified manner by assembling the prior through affinity.
It achieves new state-of-the-art results in the FSS task, with mIoU of $77.6$ on $textPASCAL-5i$ and $59.4$ on $textCOCO-20i$ in 1-shot scenario.
arXiv Detail & Related papers (2023-08-15T02:46:49Z) - Exploring Open-Vocabulary Semantic Segmentation without Human Labels [76.15862573035565]
We present ZeroSeg, a novel method that leverages the existing pretrained vision-language model (VL) to train semantic segmentation models.
ZeroSeg overcomes this by distilling the visual concepts learned by VL models into a set of segment tokens, each summarizing a localized region of the target image.
Our approach achieves state-of-the-art performance when compared to other zero-shot segmentation methods under the same training data.
arXiv Detail & Related papers (2023-06-01T08:47:06Z) - Betrayed by Captions: Joint Caption Grounding and Generation for Open
Vocabulary Instance Segmentation [80.48979302400868]
We focus on open vocabulary instance segmentation to expand a segmentation model to classify and segment instance-level novel categories.
Previous approaches have relied on massive caption datasets and complex pipelines to establish one-to-one mappings between image regions and captions in nouns.
We devise a joint textbfCaption Grounding and Generation (CGG) framework, which incorporates a novel grounding loss that only focuses on matching object to improve learning efficiency.
arXiv Detail & Related papers (2023-01-02T18:52:12Z) - SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary
Semantic Segmentation [26.079055078561986]
We propose a CLIP-based model named SegCLIP for the topic of open-vocabulary segmentation.
The main idea is to gather patches with learnable centers to semantic regions through training on text-image pairs.
Experimental results show that our model achieves comparable or superior segmentation accuracy.
arXiv Detail & Related papers (2022-11-27T12:38:52Z) - Prediction Calibration for Generalized Few-shot Semantic Segmentation [101.69940565204816]
Generalized Few-shot Semantic (GFSS) aims to segment each image pixel into either base classes with abundant training examples or novel classes with only a handful of (e.g., 1-5) training images per class.
We build a cross-attention module that guides the classifier's final prediction using the fused multi-level features.
Our PCN outperforms the state-the-art alternatives by large margins.
arXiv Detail & Related papers (2022-10-15T13:30:12Z) - Learning Self-Supervised Low-Rank Network for Single-Stage Weakly and
Semi-Supervised Semantic Segmentation [119.009033745244]
This paper presents a Self-supervised Low-Rank Network ( SLRNet) for single-stage weakly supervised semantic segmentation (WSSS) and semi-supervised semantic segmentation (SSSS)
SLRNet uses cross-view self-supervision, that is, it simultaneously predicts several attentive LR representations from different views of an image to learn precise pseudo-labels.
Experiments on the Pascal VOC 2012, COCO, and L2ID datasets demonstrate that our SLRNet outperforms both state-of-the-art WSSS and SSSS methods with a variety of different settings.
arXiv Detail & Related papers (2022-03-19T09:19:55Z) - Decoupling Zero-Shot Semantic Segmentation [46.55494691004304]
Zero-shot semantic segmentation (ZS3) aims to segment the novel categories that have not been seen in the training.
We propose a simple and effective zero-shot semantic segmentation model, called ZegFormer.
arXiv Detail & Related papers (2021-12-15T06:21:47Z) - Novel Class Discovery in Semantic Segmentation [104.30729847367104]
We introduce a new setting of Novel Class Discovery in Semantic (NCDSS)
It aims at segmenting unlabeled images containing new classes given prior knowledge from a labeled set of disjoint classes.
In NCDSS, we need to distinguish the objects and background, and to handle the existence of multiple classes within an image.
We propose the Entropy-based Uncertainty Modeling and Self-training (EUMS) framework to overcome noisy pseudo-labels.
arXiv Detail & Related papers (2021-12-03T13:31:59Z) - Zero-Shot Semantic Segmentation via Spatial and Multi-Scale Aware Visual
Class Embedding [0.0]
We propose a language-model-free zero-shot semantic segmentation framework, Spatial and Multi-scale aware Visual Class Embedding Network (SM-VCENet)
In experiments, our SM-VCENet outperforms zero-shot semantic segmentation state-of-the-art by a relative margin.
arXiv Detail & Related papers (2021-11-30T07:39:19Z) - Few-shot 3D Point Cloud Semantic Segmentation [138.80825169240302]
We propose a novel attention-aware multi-prototype transductive few-shot point cloud semantic segmentation method.
Our proposed method shows significant and consistent improvements compared to baselines in different few-shot point cloud semantic segmentation settings.
arXiv Detail & Related papers (2020-06-22T08:05:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.