Primitive Generation and Semantic-related Alignment for Universal
Zero-Shot Segmentation
- URL: http://arxiv.org/abs/2306.11087v1
- Date: Mon, 19 Jun 2023 17:59:16 GMT
- Title: Primitive Generation and Semantic-related Alignment for Universal
Zero-Shot Segmentation
- Authors: Shuting He, Henghui Ding, Wei Jiang
- Abstract summary: We study universal zero-shot segmentation in this work to achieve panoptic, instance, and semantic segmentation for novel categories without any training samples.
We introduce a generative model to synthesize features for unseen categories, which links semantic and visual spaces.
The proposed approach achieves impressively state-of-the-art performance on zero-shot panoptic segmentation, instance segmentation, and semantic segmentation.
- Score: 13.001629605405954
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study universal zero-shot segmentation in this work to achieve panoptic,
instance, and semantic segmentation for novel categories without any training
samples. Such zero-shot segmentation ability relies on inter-class
relationships in semantic space to transfer the visual knowledge learned from
seen categories to unseen ones. Thus, it is desired to well bridge
semantic-visual spaces and apply the semantic relationships to visual feature
learning. We introduce a generative model to synthesize features for unseen
categories, which links semantic and visual spaces as well as addresses the
issue of lack of unseen training data. Furthermore, to mitigate the domain gap
between semantic and visual spaces, firstly, we enhance the vanilla generator
with learned primitives, each of which contains fine-grained attributes related
to categories, and synthesize unseen features by selectively assembling these
primitives. Secondly, we propose to disentangle the visual feature into the
semantic-related part and the semantic-unrelated part that contains useful
visual classification clues but is less relevant to semantic representation.
The inter-class relationships of semantic-related visual features are then
required to be aligned with those in semantic space, thereby transferring
semantic knowledge to visual feature learning. The proposed approach achieves
impressively state-of-the-art performance on zero-shot panoptic segmentation,
instance segmentation, and semantic segmentation. Code is available at
https://henghuiding.github.io/PADing/.
Related papers
- Dual Relation Mining Network for Zero-Shot Learning [48.89161627050706]
We propose a Dual Relation Mining Network (DRMN) to enable effective visual-semantic interactions and learn semantic relationship among attributes for knowledge transfer.
Specifically, we introduce a Dual Attention Block (DAB) for visual-semantic relationship mining, which enriches visual information by multi-level feature fusion.
For semantic relationship modeling, we utilize a Semantic Interaction Transformer (SIT) to enhance the generalization of attribute representations among images.
arXiv Detail & Related papers (2024-05-06T16:31:19Z) - Advancing Incremental Few-shot Semantic Segmentation via Semantic-guided
Relation Alignment and Adaptation [98.51938442785179]
Incremental few-shot semantic segmentation aims to incrementally extend a semantic segmentation model to novel classes.
This task faces a severe semantic-aliasing issue between base and novel classes due to data imbalance.
We propose the Semantic-guided Relation Alignment and Adaptation (SRAA) method that fully considers the guidance of prior semantic information.
arXiv Detail & Related papers (2023-05-18T10:40:52Z) - Open-world Semantic Segmentation via Contrasting and Clustering
Vision-Language Embedding [95.78002228538841]
We propose a new open-world semantic segmentation pipeline that makes the first attempt to learn to segment semantic objects of various open-world categories without any efforts on dense annotations.
Our method can directly segment objects of arbitrary categories, outperforming zero-shot segmentation methods that require data labeling on three benchmark datasets.
arXiv Detail & Related papers (2022-07-18T09:20:04Z) - Self-Supervised Visual Representation Learning with Semantic Grouping [50.14703605659837]
We tackle the problem of learning visual representations from unlabeled scene-centric data.
We propose contrastive learning from data-driven semantic slots, namely SlotCon, for joint semantic grouping and representation learning.
arXiv Detail & Related papers (2022-05-30T17:50:59Z) - VGSE: Visually-Grounded Semantic Embeddings for Zero-Shot Learning [113.50220968583353]
We propose to discover semantic embeddings containing discriminative visual properties for zero-shot learning.
Our model visually divides a set of images from seen classes into clusters of local image regions according to their visual similarity.
We demonstrate that our visually-grounded semantic embeddings further improve performance over word embeddings across various ZSL models by a large margin.
arXiv Detail & Related papers (2022-03-20T03:49:02Z) - Semantic Disentangling Generalized Zero-Shot Learning [50.259058462272435]
Generalized Zero-Shot Learning (GZSL) aims to recognize images from both seen and unseen categories.
In this paper, we propose a novel feature disentangling approach based on an encoder-decoder architecture.
The proposed model aims to distill quality semantic-consistent representations that capture intrinsic features of seen images.
arXiv Detail & Related papers (2021-01-20T05:46:21Z) - From Pixel to Patch: Synthesize Context-aware Features for Zero-shot
Semantic Segmentation [22.88452754438478]
We focus on zero-shot semantic segmentation, which aims to segment unseen objects with only category-level semantic representations.
We propose a novel Context-aware feature Generation Network (CaGNet), which can synthesize context-aware pixel-wise visual features for unseen categories.
Experimental results on Pascal-VOC, Pascal-Context, and COCO-stuff show that our method significantly outperforms the existing zero-shot semantic segmentation methods.
arXiv Detail & Related papers (2020-09-25T13:26:30Z) - A Novel Perspective to Zero-shot Learning: Towards an Alignment of
Manifold Structures via Semantic Feature Expansion [17.48923061278128]
A common practice in zero-shot learning is to train a projection between the visual and semantic feature spaces with labeled seen classes examples.
Under such a paradigm, most existing methods easily suffer from the domain shift problem and weaken the performance of zero-shot recognition.
We propose a novel model called AMS-SFE that considers the alignment of manifold structures by semantic feature expansion.
arXiv Detail & Related papers (2020-04-30T14:08:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.