Recursive Training for Zero-Shot Semantic Segmentation
- URL: http://arxiv.org/abs/2103.00086v1
- Date: Fri, 26 Feb 2021 23:44:16 GMT
- Title: Recursive Training for Zero-Shot Semantic Segmentation
- Authors: Ce Wang, Moshiur Farazi, Nick Barnes
- Abstract summary: We propose a training scheme to supervise the retraining of a semantic segmentation model for a zero-shot setting.
We show that our proposed model achieves state-of-the-art performance on the Pascal-VOC 2012 dataset and Pascal-Context dataset.
- Score: 26.89352005206994
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: General purpose semantic segmentation relies on a backbone CNN network to
extract discriminative features that help classify each image pixel into a
'seen' object class (ie., the object classes available during training) or a
background class. Zero-shot semantic segmentation is a challenging task that
requires a computer vision model to identify image pixels belonging to an
object class which it has never seen before. Equipping a general purpose
semantic segmentation model to separate image pixels of 'unseen' classes from
the background remains an open challenge. Some recent models have approached
this problem by fine-tuning the final pixel classification layer of a semantic
segmentation model for a Zero-Shot setting, but struggle to learn
discriminative features due to the lack of supervision. We propose a recursive
training scheme to supervise the retraining of a semantic segmentation model
for a zero-shot setting using a pseudo-feature representation. To this end, we
propose a Zero-Shot Maximum Mean Discrepancy (ZS-MMD) loss that weighs high
confidence outputs of the pixel classification layer as a pseudo-feature
representation, and feeds it back to the generator. By closing-the-loop on the
generator end, we provide supervision during retraining that in turn helps the
model learn a more discriminative feature representation for 'unseen' classes.
We show that using our recursive training and ZS-MMD loss, our proposed model
achieves state-of-the-art performance on the Pascal-VOC 2012 dataset and
Pascal-Context dataset.
Related papers
- UnSeg: One Universal Unlearnable Example Generator is Enough against All Image Segmentation [64.01742988773745]
An increasing privacy concern exists regarding training large-scale image segmentation models on unauthorized private data.
We exploit the concept of unlearnable examples to make images unusable to model training by generating and adding unlearnable noise into the original images.
We empirically verify the effectiveness of UnSeg across 6 mainstream image segmentation tasks, 10 widely used datasets, and 7 different network architectures.
arXiv Detail & Related papers (2024-10-13T16:34:46Z) - Exploring Open-Vocabulary Semantic Segmentation without Human Labels [76.15862573035565]
We present ZeroSeg, a novel method that leverages the existing pretrained vision-language model (VL) to train semantic segmentation models.
ZeroSeg overcomes this by distilling the visual concepts learned by VL models into a set of segment tokens, each summarizing a localized region of the target image.
Our approach achieves state-of-the-art performance when compared to other zero-shot segmentation methods under the same training data.
arXiv Detail & Related papers (2023-06-01T08:47:06Z) - Distilling Ensemble of Explanations for Weakly-Supervised Pre-Training
of Image Segmentation Models [54.49581189337848]
We propose a method to enable the end-to-end pre-training for image segmentation models based on classification datasets.
The proposed method leverages a weighted segmentation learning procedure to pre-train the segmentation network en masse.
Experiment results show that, with ImageNet accompanied by PSSL as the source dataset, the proposed end-to-end pre-training strategy successfully boosts the performance of various segmentation models.
arXiv Detail & Related papers (2022-07-04T13:02:32Z) - A Simple Baseline for Zero-shot Semantic Segmentation with Pre-trained
Vision-language Model [61.58071099082296]
It is unclear how to make zero-shot recognition working well on broader vision problems, such as object detection and semantic segmentation.
In this paper, we target for zero-shot semantic segmentation, by building it on an off-the-shelf pre-trained vision-language model, i.e., CLIP.
Our experimental results show that this simple framework surpasses previous state-of-the-arts by a large margin.
arXiv Detail & Related papers (2021-12-29T18:56:18Z) - Half-Real Half-Fake Distillation for Class-Incremental Semantic
Segmentation [84.1985497426083]
convolutional neural networks are ill-equipped for incremental learning.
New classes are available but the initial training data is not retained.
We try to address this issue by "inverting" the trained segmentation network to synthesize input images starting from random noise.
arXiv Detail & Related papers (2021-04-02T03:47:16Z) - Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation [49.90178055521207]
This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation.
We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths.
In particular, we devise a graph neural network (GNN) for group-wise semantic mining, wherein input images are represented as graph nodes.
arXiv Detail & Related papers (2020-12-09T12:40:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.