Class Enhancement Losses with Pseudo Labels for Zero-shot Semantic
Segmentation
- URL: http://arxiv.org/abs/2301.07336v1
- Date: Wed, 18 Jan 2023 06:55:02 GMT
- Title: Class Enhancement Losses with Pseudo Labels for Zero-shot Semantic
Segmentation
- Authors: Son Duy Dao, Hengcan Shi, Dinh Phung, Jianfei Cai
- Abstract summary: Mask proposal models have significantly improved the performance of zero-shot semantic segmentation.
The use of a background' embedding during training in these methods is problematic as the resulting model tends to over-learn and assign all unseen classes as the background class instead of their correct labels.
This paper proposes novel class enhancement losses to bypass the use of the background embbedding during training, and simultaneously exploit the semantic relationship between text embeddings and mask proposals by ranking the similarity scores.
- Score: 40.09476732999614
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent mask proposal models have significantly improved the performance of
zero-shot semantic segmentation. However, the use of a `background' embedding
during training in these methods is problematic as the resulting model tends to
over-learn and assign all unseen classes as the background class instead of
their correct labels. Furthermore, they ignore the semantic relationship of
text embeddings, which arguably can be highly informative for zero-shot
prediction as seen classes may have close relationship with unseen classes. To
this end, this paper proposes novel class enhancement losses to bypass the use
of the background embbedding during training, and simultaneously exploit the
semantic relationship between text embeddings and mask proposals by ranking the
similarity scores. To further capture the relationship between seen and unseen
classes, we propose an effective pseudo label generation pipeline using
pretrained vision-language model. Extensive experiments on several benchmark
datasets show that our method achieves overall the best performance for
zero-shot semantic segmentation. Our method is flexible, and can also be
applied to the challenging open-vocabulary semantic segmentation problem.
Related papers
- Manual Verbalizer Enrichment for Few-Shot Text Classification [1.860409237919611]
acrshortmave is an approach for verbalizer construction by enrichment of class labels.
Our model achieves state-of-the-art results while using significantly fewer resources.
arXiv Detail & Related papers (2024-10-08T16:16:47Z) - Learning to Detour: Shortcut Mitigating Augmentation for Weakly Supervised Semantic Segmentation [7.5856806269316825]
Weakly supervised semantic segmentation (WSSS) employing weak forms of labels has been actively studied to alleviate the annotation cost of acquiring pixel-level labels.
We propose shortcut mitigating augmentation (SMA) for WSSS, which generates synthetic representations of object-background combinations not seen in the training data to reduce the use of shortcut features.
arXiv Detail & Related papers (2024-05-28T13:07:35Z) - RaSP: Relation-aware Semantic Prior for Weakly Supervised Incremental
Segmentation [28.02204928717511]
We propose a weakly supervised approach to transfer objectness prior from the previously learned classes into the new ones.
We show how even a simple pairwise interaction between classes can significantly improve the segmentation mask quality of both old and new classes.
arXiv Detail & Related papers (2023-05-31T14:14:21Z) - Advancing Incremental Few-shot Semantic Segmentation via Semantic-guided
Relation Alignment and Adaptation [98.51938442785179]
Incremental few-shot semantic segmentation aims to incrementally extend a semantic segmentation model to novel classes.
This task faces a severe semantic-aliasing issue between base and novel classes due to data imbalance.
We propose the Semantic-guided Relation Alignment and Adaptation (SRAA) method that fully considers the guidance of prior semantic information.
arXiv Detail & Related papers (2023-05-18T10:40:52Z) - Learning Context-aware Classifier for Semantic Segmentation [88.88198210948426]
In this paper, contextual hints are exploited via learning a context-aware classifier.
Our method is model-agnostic and can be easily applied to generic segmentation models.
With only negligible additional parameters and +2% inference time, decent performance gain has been achieved on both small and large models.
arXiv Detail & Related papers (2023-03-21T07:00:35Z) - Incremental Learning in Semantic Segmentation from Image Labels [18.404068463921426]
Existing semantic segmentation approaches achieve impressive results, but struggle to update their models incrementally as new categories are uncovered.
This paper proposes a novel framework for Weakly Incremental Learning for Semantics, that aims at learning to segment new classes from cheap and largely available image-level labels.
As opposed to existing approaches, that need to generate pseudo-labels offline, we use an auxiliary classifier, trained with image-level labels and regularized by the segmentation model, to obtain pseudo-supervision online and update the model incrementally.
arXiv Detail & Related papers (2021-12-03T12:47:12Z) - Learning Debiased and Disentangled Representations for Semantic
Segmentation [52.35766945827972]
We propose a model-agnostic and training scheme for semantic segmentation.
By randomly eliminating certain class information in each training iteration, we effectively reduce feature dependencies among classes.
Models trained with our approach demonstrate strong results on multiple semantic segmentation benchmarks.
arXiv Detail & Related papers (2021-10-31T16:15:09Z) - Dense Contrastive Visual-Linguistic Pretraining [53.61233531733243]
Several multimodal representation learning approaches have been proposed that jointly represent image and text.
These approaches achieve superior performance by capturing high-level semantic information from large-scale multimodal pretraining.
We propose unbiased Dense Contrastive Visual-Linguistic Pretraining to replace the region regression and classification with cross-modality region contrastive learning.
arXiv Detail & Related papers (2021-09-24T07:20:13Z) - Leveraging Auxiliary Tasks with Affinity Learning for Weakly Supervised
Semantic Segmentation [88.49669148290306]
We propose a novel weakly supervised multi-task framework called AuxSegNet to leverage saliency detection and multi-label image classification as auxiliary tasks.
Inspired by their similar structured semantics, we also propose to learn a cross-task global pixel-level affinity map from the saliency and segmentation representations.
The learned cross-task affinity can be used to refine saliency predictions and propagate CAM maps to provide improved pseudo labels for both tasks.
arXiv Detail & Related papers (2021-07-25T11:39:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.