Context-aware Feature Generation for Zero-shot Semantic Segmentation
- URL: http://arxiv.org/abs/2008.06893v1
- Date: Sun, 16 Aug 2020 12:20:49 GMT
- Title: Context-aware Feature Generation for Zero-shot Semantic Segmentation
- Authors: Zhangxuan Gu, and Siyuan Zhou, and Li Niu, and Zihan Zhao, and Liqing
Zhang
- Abstract summary: We propose a novel context-aware feature generation method for zero-shot segmentation named CaGNet.
Our method achieves state-of-the-art results on three benchmark datasets for zero-shot segmentation.
- Score: 18.37777970377439
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing semantic segmentation models heavily rely on dense pixel-wise
annotations. To reduce the annotation pressure, we focus on a challenging task
named zero-shot semantic segmentation, which aims to segment unseen objects
with zero annotations. This task can be accomplished by transferring knowledge
across categories via semantic word embeddings. In this paper, we propose a
novel context-aware feature generation method for zero-shot segmentation named
CaGNet. In particular, with the observation that a pixel-wise feature highly
depends on its contextual information, we insert a contextual module in a
segmentation network to capture the pixel-wise contextual information, which
guides the process of generating more diverse and context-aware features from
semantic word embeddings. Our method achieves state-of-the-art results on three
benchmark datasets for zero-shot segmentation. Codes are available at:
https://github.com/bcmi/CaGNet-Zero-Shot-Semantic-Segmentation.
Related papers
- Learning Semantic Segmentation with Query Points Supervision on Aerial Images [57.09251327650334]
We present a weakly supervised learning algorithm to train semantic segmentation algorithms.
Our proposed approach performs accurate semantic segmentation and improves efficiency by significantly reducing the cost and time required for manual annotation.
arXiv Detail & Related papers (2023-09-11T14:32:04Z) - Exploring Open-Vocabulary Semantic Segmentation without Human Labels [76.15862573035565]
We present ZeroSeg, a novel method that leverages the existing pretrained vision-language model (VL) to train semantic segmentation models.
ZeroSeg overcomes this by distilling the visual concepts learned by VL models into a set of segment tokens, each summarizing a localized region of the target image.
Our approach achieves state-of-the-art performance when compared to other zero-shot segmentation methods under the same training data.
arXiv Detail & Related papers (2023-06-01T08:47:06Z) - Open-world Semantic Segmentation via Contrasting and Clustering
Vision-Language Embedding [95.78002228538841]
We propose a new open-world semantic segmentation pipeline that makes the first attempt to learn to segment semantic objects of various open-world categories without any efforts on dense annotations.
Our method can directly segment objects of arbitrary categories, outperforming zero-shot segmentation methods that require data labeling on three benchmark datasets.
arXiv Detail & Related papers (2022-07-18T09:20:04Z) - SIGN: Spatial-information Incorporated Generative Network for
Generalized Zero-shot Semantic Segmentation [22.718908677552196]
zero-shot semantic segmentation predicts a class label at the pixel level instead of the image level.
Relative Positional integrates spatial information at the feature level and can handle arbitrary image sizes.
Anneal Self-Training can automatically assign different importance to pseudo-labels.
arXiv Detail & Related papers (2021-08-27T22:18:24Z) - Universal Weakly Supervised Segmentation by Pixel-to-Segment Contrastive
Learning [28.498782661888775]
We formulate weakly supervised segmentation as a semi-supervised metric learning problem.
We propose 4 types of contrastive relationships between pixels and segments in the feature space.
We deliver a universal weakly supervised segmenter with significant gains on Pascal VOC and DensePose.
arXiv Detail & Related papers (2021-05-03T15:49:01Z) - Exploring Cross-Image Pixel Contrast for Semantic Segmentation [130.22216825377618]
We propose a pixel-wise contrastive framework for semantic segmentation in the fully supervised setting.
The core idea is to enforce pixel embeddings belonging to a same semantic class to be more similar than embeddings from different classes.
Our method can be effortlessly incorporated into existing segmentation frameworks without extra overhead during testing.
arXiv Detail & Related papers (2021-01-28T11:35:32Z) - From Pixel to Patch: Synthesize Context-aware Features for Zero-shot
Semantic Segmentation [22.88452754438478]
We focus on zero-shot semantic segmentation, which aims to segment unseen objects with only category-level semantic representations.
We propose a novel Context-aware feature Generation Network (CaGNet), which can synthesize context-aware pixel-wise visual features for unseen categories.
Experimental results on Pascal-VOC, Pascal-Context, and COCO-stuff show that our method significantly outperforms the existing zero-shot semantic segmentation methods.
arXiv Detail & Related papers (2020-09-25T13:26:30Z) - Improving Semantic Segmentation via Decoupled Body and Edge Supervision [89.57847958016981]
Existing semantic segmentation approaches either aim to improve the object's inner consistency by modeling the global context, or refine objects detail along their boundaries by multi-scale feature fusion.
In this paper, a new paradigm for semantic segmentation is proposed.
Our insight is that appealing performance of semantic segmentation requires textitexplicitly modeling the object textitbody and textitedge, which correspond to the high and low frequency of the image.
We show that the proposed framework with various baselines or backbone networks leads to better object inner consistency and object boundaries.
arXiv Detail & Related papers (2020-07-20T12:11:22Z) - SceneEncoder: Scene-Aware Semantic Segmentation of Point Clouds with A
Learnable Scene Descriptor [51.298760338410624]
We propose a SceneEncoder module to impose a scene-aware guidance to enhance the effect of global information.
The module predicts a scene descriptor, which learns to represent the categories of objects existing in the scene.
We also design a region similarity loss to propagate distinguishing features to their own neighboring points with the same label.
arXiv Detail & Related papers (2020-01-24T16:53:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.