SIGN: Spatial-information Incorporated Generative Network for
Generalized Zero-shot Semantic Segmentation
- URL: http://arxiv.org/abs/2108.12517v1
- Date: Fri, 27 Aug 2021 22:18:24 GMT
- Title: SIGN: Spatial-information Incorporated Generative Network for
Generalized Zero-shot Semantic Segmentation
- Authors: Jiaxin Cheng, Soumyaroop Nandi, Prem Natarajan, Wael Abd-Almageed
- Abstract summary: zero-shot semantic segmentation predicts a class label at the pixel level instead of the image level.
Relative Positional integrates spatial information at the feature level and can handle arbitrary image sizes.
Anneal Self-Training can automatically assign different importance to pseudo-labels.
- Score: 22.718908677552196
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Unlike conventional zero-shot classification, zero-shot semantic segmentation
predicts a class label at the pixel level instead of the image level. When
solving zero-shot semantic segmentation problems, the need for pixel-level
prediction with surrounding context motivates us to incorporate spatial
information using positional encoding. We improve standard positional encoding
by introducing the concept of Relative Positional Encoding, which integrates
spatial information at the feature level and can handle arbitrary image sizes.
Furthermore, while self-training is widely used in zero-shot semantic
segmentation to generate pseudo-labels, we propose a new
knowledge-distillation-inspired self-training strategy, namely Annealed
Self-Training, which can automatically assign different importance to
pseudo-labels to improve performance. We systematically study the proposed
Relative Positional Encoding and Annealed Self-Training in a comprehensive
experimental evaluation, and our empirical results confirm the effectiveness of
our method on three benchmark datasets.
Related papers
- Semi-supervised Semantic Segmentation Meets Masked Modeling:Fine-grained
Locality Learning Matters in Consistency Regularization [31.333862320143968]
Semi-supervised semantic segmentation aims to utilize limited labeled images and abundant unlabeled images to achieve label-efficient learning.
We propose a novel framework called textttMaskMatch, which enables fine-grained locality learning to achieve better dense segmentation.
arXiv Detail & Related papers (2023-12-14T03:28:53Z) - Learning Semantic Segmentation with Query Points Supervision on Aerial Images [57.09251327650334]
We present a weakly supervised learning algorithm to train semantic segmentation algorithms.
Our proposed approach performs accurate semantic segmentation and improves efficiency by significantly reducing the cost and time required for manual annotation.
arXiv Detail & Related papers (2023-09-11T14:32:04Z) - Exploring Open-Vocabulary Semantic Segmentation without Human Labels [76.15862573035565]
We present ZeroSeg, a novel method that leverages the existing pretrained vision-language model (VL) to train semantic segmentation models.
ZeroSeg overcomes this by distilling the visual concepts learned by VL models into a set of segment tokens, each summarizing a localized region of the target image.
Our approach achieves state-of-the-art performance when compared to other zero-shot segmentation methods under the same training data.
arXiv Detail & Related papers (2023-06-01T08:47:06Z) - Location-Aware Self-Supervised Transformers [74.76585889813207]
We propose to pretrain networks for semantic segmentation by predicting the relative location of image parts.
We control the difficulty of the task by masking a subset of the reference patch features visible to those of the query.
Our experiments show that this location-aware pretraining leads to representations that transfer competitively to several challenging semantic segmentation benchmarks.
arXiv Detail & Related papers (2022-12-05T16:24:29Z) - Language-driven Semantic Segmentation [88.21498323896475]
We present LSeg, a novel model for language-driven semantic image segmentation.
We use a text encoder to compute embeddings of descriptive input labels.
The encoder is trained with a contrastive objective to align pixel embeddings to the text embedding of the corresponding semantic class.
arXiv Detail & Related papers (2022-01-10T18:59:10Z) - Local contrastive loss with pseudo-label based self-training for
semi-supervised medical image segmentation [13.996217500923413]
Semi/self-supervised learning-based approaches exploit unlabeled data along with limited annotated data.
Recent self-supervised learning methods use contrastive loss to learn good global level representations from unlabeled images.
We propose a local contrastive loss to learn good pixel level features useful for segmentation by exploiting semantic label information.
arXiv Detail & Related papers (2021-12-17T17:38:56Z) - InfoSeg: Unsupervised Semantic Image Segmentation with Mutual
Information Maximization [0.0]
We propose a novel method for unsupervised image representation based on mutual information between local and global high-level image features.
In the first step, we segment images based on local and global features.
In the second step, we maximize the mutual information between local features and high-level features of their respective class.
arXiv Detail & Related papers (2021-10-07T14:01:42Z) - Navigation-Oriented Scene Understanding for Robotic Autonomy: Learning
to Segment Driveability in Egocentric Images [25.350677396144075]
This work tackles scene understanding for outdoor robotic navigation, solely relying on images captured by an on-board camera.
We segment egocentric images directly in terms of how a robot can navigate in them, and tailor the learning problem to an autonomous navigation task.
We present a generic and scalable affordance-based definition consisting of 3 driveability levels which can be applied to arbitrary scenes.
arXiv Detail & Related papers (2021-09-15T12:25:56Z) - Context-aware Feature Generation for Zero-shot Semantic Segmentation [18.37777970377439]
We propose a novel context-aware feature generation method for zero-shot segmentation named CaGNet.
Our method achieves state-of-the-art results on three benchmark datasets for zero-shot segmentation.
arXiv Detail & Related papers (2020-08-16T12:20:49Z) - Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation [128.03739769844736]
Two neural co-attentions are incorporated into the classifier to capture cross-image semantic similarities and differences.
In addition to boosting object pattern learning, the co-attention can leverage context from other related images to improve localization map inference.
Our algorithm sets new state-of-the-arts on all these settings, demonstrating well its efficacy and generalizability.
arXiv Detail & Related papers (2020-07-03T21:53:46Z) - Self-Supervised Tuning for Few-Shot Segmentation [82.32143982269892]
Few-shot segmentation aims at assigning a category label to each image pixel with few annotated samples.
Existing meta-learning method tends to fail in generating category-specifically discriminative descriptor when the visual features extracted from support images are marginalized in embedding space.
This paper presents an adaptive framework tuning, in which the distribution of latent features across different episodes is dynamically adjusted based on a self-segmentation scheme.
arXiv Detail & Related papers (2020-04-12T03:53:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.