Delving into Shape-aware Zero-shot Semantic Segmentation
- URL: http://arxiv.org/abs/2304.08491v1
- Date: Mon, 17 Apr 2023 17:59:46 GMT
- Title: Delving into Shape-aware Zero-shot Semantic Segmentation
- Authors: Xinyu Liu, Beiwen Tian, Zhen Wang, Rui Wang, Kehua Sheng, Bo Zhang,
Hao Zhao, Guyue Zhou
- Abstract summary: We present textbfshape-aware zero-shot semantic segmentation.
Inspired by classical spectral methods, we propose to leverage the eigen vectors of Laplacian matrices constructed with self-supervised pixel-wise features.
Our method sets new state-of-the-art performance for zero-shot semantic segmentation on both Pascal and COCO.
- Score: 18.51025849474123
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Thanks to the impressive progress of large-scale vision-language pretraining,
recent recognition models can classify arbitrary objects in a zero-shot and
open-set manner, with a surprisingly high accuracy. However, translating this
success to semantic segmentation is not trivial, because this dense prediction
task requires not only accurate semantic understanding but also fine shape
delineation and existing vision-language models are trained with image-level
language descriptions. To bridge this gap, we pursue \textbf{shape-aware}
zero-shot semantic segmentation in this study. Inspired by classical spectral
methods in the image segmentation literature, we propose to leverage the eigen
vectors of Laplacian matrices constructed with self-supervised pixel-wise
features to promote shape-awareness. Despite that this simple and effective
technique does not make use of the masks of seen classes at all, we demonstrate
that it out-performs a state-of-the-art shape-aware formulation that aligns
ground truth and predicted edges during training. We also delve into the
performance gains achieved on different datasets using different backbones and
draw several interesting and conclusive observations: the benefits of promoting
shape-awareness highly relates to mask compactness and language embedding
locality. Finally, our method sets new state-of-the-art performance for
zero-shot semantic segmentation on both Pascal and COCO, with significant
margins. Code and models will be accessed at https://github.com/Liuxinyv/SAZS.
Related papers
- Language-Driven Visual Consensus for Zero-Shot Semantic Segmentation [114.72734384299476]
We propose a Language-Driven Visual Consensus (LDVC) approach, fostering improved alignment of semantic and visual information.
We leverage class embeddings as anchors due to their discrete and abstract nature, steering vision features toward class embeddings.
Our approach significantly boosts the capacity of segmentation models for unseen classes.
arXiv Detail & Related papers (2024-03-13T11:23:55Z) - Exploring Open-Vocabulary Semantic Segmentation without Human Labels [76.15862573035565]
We present ZeroSeg, a novel method that leverages the existing pretrained vision-language model (VL) to train semantic segmentation models.
ZeroSeg overcomes this by distilling the visual concepts learned by VL models into a set of segment tokens, each summarizing a localized region of the target image.
Our approach achieves state-of-the-art performance when compared to other zero-shot segmentation methods under the same training data.
arXiv Detail & Related papers (2023-06-01T08:47:06Z) - CLIP-S$^4$: Language-Guided Self-Supervised Semantic Segmentation [15.29479338808226]
We present CLIP-S$4$ that leverages self-supervised pixel representation learning and vision-language models to enable various semantic segmentation tasks.
Our approach shows consistent and substantial performance improvement over four popular benchmarks.
arXiv Detail & Related papers (2023-05-01T19:01:01Z) - Learning Context-aware Classifier for Semantic Segmentation [88.88198210948426]
In this paper, contextual hints are exploited via learning a context-aware classifier.
Our method is model-agnostic and can be easily applied to generic segmentation models.
With only negligible additional parameters and +2% inference time, decent performance gain has been achieved on both small and large models.
arXiv Detail & Related papers (2023-03-21T07:00:35Z) - Class Enhancement Losses with Pseudo Labels for Zero-shot Semantic
Segmentation [40.09476732999614]
Mask proposal models have significantly improved the performance of zero-shot semantic segmentation.
The use of a background' embedding during training in these methods is problematic as the resulting model tends to over-learn and assign all unseen classes as the background class instead of their correct labels.
This paper proposes novel class enhancement losses to bypass the use of the background embbedding during training, and simultaneously exploit the semantic relationship between text embeddings and mask proposals by ranking the similarity scores.
arXiv Detail & Related papers (2023-01-18T06:55:02Z) - Exploiting Shape Cues for Weakly Supervised Semantic Segmentation [15.791415215216029]
Weakly supervised semantic segmentation (WSSS) aims to produce pixel-wise class predictions with only image-level labels for training.
We propose to exploit shape information to supplement the texture-biased property of convolutional neural networks (CNNs)
We further refine the predictions in an online fashion with a novel refinement method that takes into account both the class and the color affinities.
arXiv Detail & Related papers (2022-08-08T17:25:31Z) - A Simple Baseline for Zero-shot Semantic Segmentation with Pre-trained
Vision-language Model [61.58071099082296]
It is unclear how to make zero-shot recognition working well on broader vision problems, such as object detection and semantic segmentation.
In this paper, we target for zero-shot semantic segmentation, by building it on an off-the-shelf pre-trained vision-language model, i.e., CLIP.
Our experimental results show that this simple framework surpasses previous state-of-the-arts by a large margin.
arXiv Detail & Related papers (2021-12-29T18:56:18Z) - Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation [49.90178055521207]
This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation.
We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths.
In particular, we devise a graph neural network (GNN) for group-wise semantic mining, wherein input images are represented as graph nodes.
arXiv Detail & Related papers (2020-12-09T12:40:13Z) - Part-aware Prototype Network for Few-shot Semantic Segmentation [50.581647306020095]
We propose a novel few-shot semantic segmentation framework based on the prototype representation.
Our key idea is to decompose the holistic class representation into a set of part-aware prototypes.
We develop a novel graph neural network model to generate and enhance the proposed part-aware prototypes.
arXiv Detail & Related papers (2020-07-13T11:03:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.