SemiVL: Semi-Supervised Semantic Segmentation with Vision-Language
Guidance
- URL: http://arxiv.org/abs/2311.16241v1
- Date: Mon, 27 Nov 2023 19:00:06 GMT
- Title: SemiVL: Semi-Supervised Semantic Segmentation with Vision-Language
Guidance
- Authors: Lukas Hoyer, David Joseph Tan, Muhammad Ferjad Naeem, Luc Van Gool,
Federico Tombari
- Abstract summary: In SemiVL, we propose to integrate rich priors from vision-language models into semi-supervised semantic segmentation.
We design a language-guided decoder to jointly reason over vision and language.
We evaluate SemiVL on 4 semantic segmentation datasets, where it significantly outperforms previous semi-supervised methods.
- Score: 97.00445262074595
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In semi-supervised semantic segmentation, a model is trained with a limited
number of labeled images along with a large corpus of unlabeled images to
reduce the high annotation effort. While previous methods are able to learn
good segmentation boundaries, they are prone to confuse classes with similar
visual appearance due to the limited supervision. On the other hand,
vision-language models (VLMs) are able to learn diverse semantic knowledge from
image-caption datasets but produce noisy segmentation due to the image-level
training. In SemiVL, we propose to integrate rich priors from VLM pre-training
into semi-supervised semantic segmentation to learn better semantic decision
boundaries. To adapt the VLM from global to local reasoning, we introduce a
spatial fine-tuning strategy for label-efficient learning. Further, we design a
language-guided decoder to jointly reason over vision and language. Finally, we
propose to handle inherent ambiguities in class labels by providing the model
with language guidance in the form of class definitions. We evaluate SemiVL on
4 semantic segmentation datasets, where it significantly outperforms previous
semi-supervised methods. For instance, SemiVL improves the state-of-the-art by
+13.5 mIoU on COCO with 232 annotated images and by +6.1 mIoU on Pascal VOC
with 92 labels. Project page: https://github.com/google-research/semivl
Related papers
- Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation [44.008094698200026]
FreeDA is a training-free diffusion-augmented method for open-vocabulary semantic segmentation.
FreeDA achieves state-of-the-art performance on five datasets.
arXiv Detail & Related papers (2024-04-09T18:00:25Z) - Grounding Everything: Emerging Localization Properties in
Vision-Language Transformers [51.260510447308306]
We show that pretrained vision-language (VL) models allow for zero-shot open-vocabulary object localization without any fine-tuning.
We propose a Grounding Everything Module (GEM) that generalizes the idea of value-value attention introduced by CLIPSurgery to a self-self attention path.
We evaluate the proposed GEM framework on various benchmark tasks and datasets for semantic segmentation.
arXiv Detail & Related papers (2023-12-01T19:06:12Z) - Learning Semantic Segmentation with Query Points Supervision on Aerial Images [57.09251327650334]
We present a weakly supervised learning algorithm to train semantic segmentation algorithms.
Our proposed approach performs accurate semantic segmentation and improves efficiency by significantly reducing the cost and time required for manual annotation.
arXiv Detail & Related papers (2023-09-11T14:32:04Z) - Exploring Open-Vocabulary Semantic Segmentation without Human Labels [76.15862573035565]
We present ZeroSeg, a novel method that leverages the existing pretrained vision-language model (VL) to train semantic segmentation models.
ZeroSeg overcomes this by distilling the visual concepts learned by VL models into a set of segment tokens, each summarizing a localized region of the target image.
Our approach achieves state-of-the-art performance when compared to other zero-shot segmentation methods under the same training data.
arXiv Detail & Related papers (2023-06-01T08:47:06Z) - IFSeg: Image-free Semantic Segmentation via Vision-Language Model [67.62922228676273]
We introduce a novel image-free segmentation task where the goal is to perform semantic segmentation given only a set of the target semantic categories.
We construct this artificial training data by creating a 2D map of random semantic categories and another map of their corresponding word tokens.
Our model not only establishes an effective baseline for this novel task but also demonstrates strong performances compared to existing methods.
arXiv Detail & Related papers (2023-03-25T08:19:31Z) - TransFGU: A Top-down Approach to Fine-Grained Unsupervised Semantic
Segmentation [44.75300205362518]
Unsupervised semantic segmentation aims to obtain high-level semantic representation on low-level visual features without manual annotations.
We propose the first top-down unsupervised semantic segmentation framework for fine-grained segmentation in extremely complicated scenarios.
Our results show that our top-down unsupervised segmentation is robust to both object-centric and scene-centric datasets.
arXiv Detail & Related papers (2021-12-02T18:59:03Z) - Remote Sensing Images Semantic Segmentation with General Remote Sensing
Vision Model via a Self-Supervised Contrastive Learning Method [13.479068312825781]
We propose Global style and Local matching Contrastive Learning Network (GLCNet) for remote sensing semantic segmentation.
Specifically, the global style contrastive module is used to learn an image-level representation better.
The local features matching contrastive module is designed to learn representations of local regions which is beneficial for semantic segmentation.
arXiv Detail & Related papers (2021-06-20T03:03:40Z) - Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation [49.90178055521207]
This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation.
We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths.
In particular, we devise a graph neural network (GNN) for group-wise semantic mining, wherein input images are represented as graph nodes.
arXiv Detail & Related papers (2020-12-09T12:40:13Z) - Discovering Latent Classes for Semi-Supervised Semantic Segmentation [18.5909667833129]
This paper studies the problem of semi-supervised semantic segmentation.
We learn latent classes consistent with semantic classes on labeled images.
We show that the proposed method achieves state of the art results for semi-supervised semantic segmentation.
arXiv Detail & Related papers (2019-12-30T14:16:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.