Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic
Segmentation
- URL: http://arxiv.org/abs/2108.06536v1
- Date: Sat, 14 Aug 2021 13:33:58 GMT
- Title: Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic
Segmentation
- Authors: Donghyeon Baek, Youngmin Oh, Bumsub Ham
- Abstract summary: Generalized zero-shot semantic segmentation (GZS3) predicts pixel-wise semantic labels for seen and unseen classes.
Most GZS3 methods adopt a generative approach that synthesizes visual features of unseen classes from corresponding semantic ones.
We propose a discriminative approach to address limitations in a unified framework.
- Score: 25.070027668717422
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We address the problem of generalized zero-shot semantic segmentation (GZS3)
predicting pixel-wise semantic labels for seen and unseen classes. Most GZS3
methods adopt a generative approach that synthesizes visual features of unseen
classes from corresponding semantic ones (e.g., word2vec) to train novel
classifiers for both seen and unseen classes. Although generative methods show
decent performance, they have two limitations: (1) the visual features are
biased towards seen classes; (2) the classifier should be retrained whenever
novel unseen classes appear. We propose a discriminative approach to address
these limitations in a unified framework. To this end, we leverage visual and
semantic encoders to learn a joint embedding space, where the semantic encoder
transforms semantic features to semantic prototypes that act as centers for
visual features of corresponding classes. Specifically, we introduce
boundary-aware regression (BAR) and semantic consistency (SC) losses to learn
discriminative features. Our approach to exploiting the joint embedding space,
together with BAR and SC terms, alleviates the seen bias problem. At test time,
we avoid the retraining process by exploiting semantic prototypes as a
nearest-neighbor (NN) classifier. To further alleviate the bias problem, we
also propose an inference technique, dubbed Apollonius calibration (AC), that
modulates the decision boundary of the NN classifier to the Apollonius circle
adaptively. Experimental results demonstrate the effectiveness of our
framework, achieving a new state of the art on standard benchmarks.
Related papers
- Semantic Enhanced Few-shot Object Detection [37.715912401900745]
We propose a fine-tuning based FSOD framework that utilizes semantic embeddings for better detection.
Our method allows each novel class to construct a compact feature space without being confused with similar base classes.
arXiv Detail & Related papers (2024-06-19T12:40:55Z) - Language-Driven Visual Consensus for Zero-Shot Semantic Segmentation [114.72734384299476]
We propose a Language-Driven Visual Consensus (LDVC) approach, fostering improved alignment of semantic and visual information.
We leverage class embeddings as anchors due to their discrete and abstract nature, steering vision features toward class embeddings.
Our approach significantly boosts the capacity of segmentation models for unseen classes.
arXiv Detail & Related papers (2024-03-13T11:23:55Z) - SEER-ZSL: Semantic Encoder-Enhanced Representations for Generalized
Zero-Shot Learning [0.7420433640907689]
Generalized Zero-Shot Learning (GZSL) recognizes unseen classes by transferring knowledge from the seen classes.
This paper introduces a dual strategy to address the generalization gap.
arXiv Detail & Related papers (2023-12-20T15:18:51Z) - Understanding Imbalanced Semantic Segmentation Through Neural Collapse [81.89121711426951]
We show that semantic segmentation naturally brings contextual correlation and imbalanced distribution among classes.
We introduce a regularizer on feature centers to encourage the network to learn features closer to the appealing structure.
Our method ranks 1st and sets a new record on the ScanNet200 test leaderboard.
arXiv Detail & Related papers (2023-01-03T13:51:51Z) - Cluster-based Contrastive Disentangling for Generalized Zero-Shot
Learning [25.92340532509084]
Generalized Zero-Shot Learning (GZSL) aims to recognize both seen and unseen classes by training only the seen classes.
We propose a Cluster-based Contrastive Disentangling (CCD) method to improve GZSL by alleviating the semantic gap and domain shift problems.
arXiv Detail & Related papers (2022-03-05T02:50:12Z) - Dual Prototypical Contrastive Learning for Few-shot Semantic
Segmentation [55.339405417090084]
We propose a dual prototypical contrastive learning approach tailored to the few-shot semantic segmentation (FSS) task.
The main idea is to encourage the prototypes more discriminative by increasing inter-class distance while reducing intra-class distance in prototype feature space.
We demonstrate that the proposed dual contrastive learning approach outperforms state-of-the-art FSS methods on PASCAL-5i and COCO-20i datasets.
arXiv Detail & Related papers (2021-11-09T08:14:50Z) - Anti-aliasing Semantic Reconstruction for Few-Shot Semantic Segmentation [66.85202434812942]
We reformulate few-shot segmentation as a semantic reconstruction problem.
We convert base class features into a series of basis vectors which span a class-level semantic space for novel class reconstruction.
Our proposed approach, referred to as anti-aliasing semantic reconstruction (ASR), provides a systematic yet interpretable solution for few-shot learning problems.
arXiv Detail & Related papers (2021-06-01T02:17:36Z) - Semantic Borrowing for Generalized Zero-Shot Learning [0.0]
Generalized zero-shot learning (GZSL) is one of the most realistic problems, but also one of the most challenging.
Instance-borrowing methods and methods solve this problem to some extent with the help of testing semantics.
A novel method called Semantic Borrowing for improving GZSL methods with compatibility metric learning under CIII is proposed in this paper.
arXiv Detail & Related papers (2021-01-30T12:14:28Z) - Entropy-Based Uncertainty Calibration for Generalized Zero-Shot Learning [49.04790688256481]
The goal of generalized zero-shot learning (GZSL) is to recognise both seen and unseen classes.
Most GZSL methods typically learn to synthesise visual representations from semantic information on the unseen classes.
We propose a novel framework that leverages dual variational autoencoders with a triplet loss to learn discriminative latent features.
arXiv Detail & Related papers (2021-01-09T05:21:27Z) - Attribute-Induced Bias Eliminating for Transductive Zero-Shot Learning [144.94728981314717]
We propose a novel Attribute-Induced Bias Eliminating (AIBE) module for Transductive ZSL.
For the visual bias between two domains, the Mean-Teacher module is first leveraged to bridge the visual representation discrepancy between two domains.
An attentional graph attribute embedding is proposed to reduce the semantic bias between seen and unseen categories.
Finally, for the semantic-visual bias in the unseen domain, an unseen semantic alignment constraint is designed to align visual and semantic space in an unsupervised manner.
arXiv Detail & Related papers (2020-05-31T02:08:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.