Related papers: A Language-Guided Benchmark for Weakly Supervised Open Vocabulary Semantic Segmentation

A Language-Guided Benchmark for Weakly Supervised Open Vocabulary Semantic Segmentation

URL: http://arxiv.org/abs/2302.14163v1
Date: Mon, 27 Feb 2023 21:55:48 GMT
Title: A Language-Guided Benchmark for Weakly Supervised Open Vocabulary Semantic Segmentation
Authors: Prashant Pandey, Mustafa Chasmai, Monish Natarajan, Brejesh Lall
Abstract summary: We propose a novel weakly supervised OVSS pipeline that can perform ZSS, FSS and Cross-dataset segmentation on novel classes. The proposed pipeline beats existing methods for weak generalized Zero-Shot and weak Few-Shot semantic segmentation by 39 and 3 mIOU points respectively.
Score: 10.054960979867584
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Increasing attention is being diverted to data-efficient problem settings like Open Vocabulary Semantic Segmentation (OVSS) which deals with segmenting an arbitrary object that may or may not be seen during training. The closest standard problems related to OVSS are Zero-Shot and Few-Shot Segmentation (ZSS, FSS) and their Cross-dataset variants where zero to few annotations are needed to segment novel classes. The existing FSS and ZSS methods utilize fully supervised pixel-labelled seen classes to segment unseen classes. Pixel-level labels are hard to obtain, and using weak supervision in the form of inexpensive image-level labels is often more practical. To this end, we propose a novel unified weakly supervised OVSS pipeline that can perform ZSS, FSS and Cross-dataset segmentation on novel classes without using pixel-level labels for either the base (seen) or the novel (unseen) classes in an inductive setting. We propose Weakly-Supervised Language-Guided Segmentation Network (WLSegNet), a novel language-guided segmentation pipeline that i) learns generalizable context vectors with batch aggregates (mean) to map class prompts to image features using frozen CLIP (a vision-language model) and ii) decouples weak ZSS/FSS into weak semantic segmentation and Zero-Shot segmentation. The learned context vectors avoid overfitting on seen classes during training and transfer better to novel classes during testing. WLSegNet avoids fine-tuning and the use of external datasets during training. The proposed pipeline beats existing methods for weak generalized Zero-Shot and weak Few-Shot semantic segmentation by 39 and 3 mIOU points respectively on PASCAL VOC and weak Few-Shot semantic segmentation by 5 mIOU points on MS COCO. On a harder setting of 2-way 1-shot weak FSS, WLSegNet beats the baselines by 13 and 22 mIOU points on PASCAL VOC and MS COCO, respectively.

Related papers

HierVL: Semi-Supervised Segmentation leveraging Hierarchical Vision-Language Synergy with Dynamic Text-Spatial Query Alignment [16.926158907882012]
Vision-only methods struggle to generalize, resulting in pixel misclassification between similar classes, poor generalization and boundary localization.<n>We introduce HierVL, a unified framework that bridges this gap by integrating abstract text embeddings into a mask-transformer architecture tailored for semi-supervised segmentation.<n>Our results show that language-guided segmentation closes the label efficiency gap and unlocks new levels of fine-grained, instance-aware generalization.
arXiv Detail & Related papers (2025-06-16T19:05:33Z)
Split Matching for Inductive Zero-shot Semantic Segmentation [52.90218623214213]
Zero-shot Semantic (ZSS) aims to segment categories that are not annotated during training.<n>We propose Split Matching (SM), a novel assignment strategy that decouples Hungarian matching into two components.<n>SM is the first to introduce decoupled Hungarian matching under the inductive ZSS setting, and achieves state-of-the-art performance on two standard benchmarks.
arXiv Detail & Related papers (2025-05-08T07:56:30Z)
Generalized Category Discovery in Semantic Segmentation [43.99230778597973]
This paper explores a novel setting called Generalized Category Discovery in Semantic (GCDSS) GCDSS aims to segment unlabeled images given prior knowledge from a labeled set of base classes. In contrast to Novel Category Discovery in Semantic (NCDSS), there is no prerequisite for prior knowledge mandating the existence of at least one novel class in each unlabeled image.
arXiv Detail & Related papers (2023-11-20T04:11:16Z)
Visual and Textual Prior Guided Mask Assemble for Few-Shot Segmentation and Beyond [0.0]
We propose a visual and textual Prior Guided Mask Assemble Network (PGMA-Net) It employs a class-agnostic mask assembly process to alleviate the bias, and formulates diverse tasks into a unified manner by assembling the prior through affinity. It achieves new state-of-the-art results in the FSS task, with mIoU of $77.6$ on $textPASCAL-5i$ and $59.4$ on $textCOCO-20i$ in 1-shot scenario.
arXiv Detail & Related papers (2023-08-15T02:46:49Z)
Exploring Open-Vocabulary Semantic Segmentation without Human Labels [76.15862573035565]
We present ZeroSeg, a novel method that leverages the existing pretrained vision-language model (VL) to train semantic segmentation models. ZeroSeg overcomes this by distilling the visual concepts learned by VL models into a set of segment tokens, each summarizing a localized region of the target image. Our approach achieves state-of-the-art performance when compared to other zero-shot segmentation methods under the same training data.
arXiv Detail & Related papers (2023-06-01T08:47:06Z)
TagCLIP: Improving Discrimination Ability of Open-Vocabulary Semantic Segmentation [53.974228542090046]
Contrastive Language-Image Pre-training (CLIP) has recently shown great promise in pixel-level zero-shot learning tasks. Existing approaches utilizing CLIP's text and patch embeddings to generate semantic masks often misidentify input pixels from unseen classes. We propose TagCLIP (Trusty-aware guided CLIP) to address this issue.
arXiv Detail & Related papers (2023-04-15T12:52:23Z)
Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation [80.48979302400868]
We focus on open vocabulary instance segmentation to expand a segmentation model to classify and segment instance-level novel categories. Previous approaches have relied on massive caption datasets and complex pipelines to establish one-to-one mappings between image regions and captions in nouns. We devise a joint textbfCaption Grounding and Generation (CGG) framework, which incorporates a novel grounding loss that only focuses on matching object to improve learning efficiency.
arXiv Detail & Related papers (2023-01-02T18:52:12Z)
SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation [26.079055078561986]
We propose a CLIP-based model named SegCLIP for the topic of open-vocabulary segmentation. The main idea is to gather patches with learnable centers to semantic regions through training on text-image pairs. Experimental results show that our model achieves comparable or superior segmentation accuracy.
arXiv Detail & Related papers (2022-11-27T12:38:52Z)
Learning Self-Supervised Low-Rank Network for Single-Stage Weakly and Semi-Supervised Semantic Segmentation [119.009033745244]
This paper presents a Self-supervised Low-Rank Network ( SLRNet) for single-stage weakly supervised semantic segmentation (WSSS) and semi-supervised semantic segmentation (SSSS) SLRNet uses cross-view self-supervision, that is, it simultaneously predicts several attentive LR representations from different views of an image to learn precise pseudo-labels. Experiments on the Pascal VOC 2012, COCO, and L2ID datasets demonstrate that our SLRNet outperforms both state-of-the-art WSSS and SSSS methods with a variety of different settings.
arXiv Detail & Related papers (2022-03-19T09:19:55Z)
Decoupling Zero-Shot Semantic Segmentation [46.55494691004304]
Zero-shot semantic segmentation (ZS3) aims to segment the novel categories that have not been seen in the training. We propose a simple and effective zero-shot semantic segmentation model, called ZegFormer.
arXiv Detail & Related papers (2021-12-15T06:21:47Z)
Novel Class Discovery in Semantic Segmentation [104.30729847367104]
We introduce a new setting of Novel Class Discovery in Semantic (NCDSS) It aims at segmenting unlabeled images containing new classes given prior knowledge from a labeled set of disjoint classes. In NCDSS, we need to distinguish the objects and background, and to handle the existence of multiple classes within an image. We propose the Entropy-based Uncertainty Modeling and Self-training (EUMS) framework to overcome noisy pseudo-labels.
arXiv Detail & Related papers (2021-12-03T13:31:59Z)
Zero-Shot Semantic Segmentation via Spatial and Multi-Scale Aware Visual Class Embedding [0.0]
We propose a language-model-free zero-shot semantic segmentation framework, Spatial and Multi-scale aware Visual Class Embedding Network (SM-VCENet) In experiments, our SM-VCENet outperforms zero-shot semantic segmentation state-of-the-art by a relative margin.
arXiv Detail & Related papers (2021-11-30T07:39:19Z)
Few-shot 3D Point Cloud Semantic Segmentation [138.80825169240302]
We propose a novel attention-aware multi-prototype transductive few-shot point cloud semantic segmentation method. Our proposed method shows significant and consistent improvements compared to baselines in different few-shot point cloud semantic segmentation settings.
arXiv Detail & Related papers (2020-06-22T08:05:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.