SemPLeS: Semantic Prompt Learning for Weakly-Supervised Semantic
Segmentation
- URL: http://arxiv.org/abs/2401.11791v2
- Date: Mon, 11 Mar 2024 04:01:50 GMT
- Title: SemPLeS: Semantic Prompt Learning for Weakly-Supervised Semantic
Segmentation
- Authors: Ci-Siang Lin, Chien-Yi Wang, Yu-Chiang Frank Wang, Min-Hung Chen
- Abstract summary: Weakly-Supervised Semantic (WSSS) aims to train segmentation models using image data with only image-level supervision.
We propose a Semantic Prompt Learning for WSSS (SemPLeS) framework, which learns to effectively prompt the CLIP latent space.
SemPLeS can perform better semantic alignment between object regions and the associated class labels.
- Score: 36.41778553250247
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Weakly-Supervised Semantic Segmentation (WSSS) aims to train segmentation
models using image data with only image-level supervision. Since precise
pixel-level annotations are not accessible, existing methods typically focus on
producing pseudo masks for training segmentation models by refining CAM-like
heatmaps. However, the produced heatmaps may capture only the discriminative
image regions of object categories or the associated co-occurring backgrounds.
To address the issues, we propose a Semantic Prompt Learning for WSSS (SemPLeS)
framework, which learns to effectively prompt the CLIP latent space to enhance
the semantic alignment between the segmented regions and the target object
categories. More specifically, we propose Contrastive Prompt Learning and
Prompt-guided Semantic Refinement to learn the prompts that adequately describe
and suppress the co-occurring backgrounds associated with each target object
category. In this way, SemPLeS can perform better semantic alignment between
object regions and the associated class labels, resulting in desired pseudo
masks for training the segmentation model. The proposed SemPLeS framework
achieves SOTA performance on the standard WSSS benchmarks, PASCAL VOC and MS
COCO, and shows compatibility with other WSSS methods. The source codes are
provided in the supplementary.
Related papers
- Vocabulary-free Image Classification and Semantic Segmentation [71.78089106671581]
We introduce the Vocabulary-free Image Classification (VIC) task, which aims to assign a class from an un-constrained language-induced semantic space to an input image without needing a known vocabulary.
VIC is challenging due to the vastness of the semantic space, which contains millions of concepts, including fine-grained categories.
We propose Category Search from External Databases (CaSED), a training-free method that leverages a pre-trained vision-language model and an external database.
arXiv Detail & Related papers (2024-04-16T19:27:21Z) - Auxiliary Tasks Enhanced Dual-affinity Learning for Weakly Supervised
Semantic Segmentation [79.05949524349005]
We propose AuxSegNet+, a weakly supervised auxiliary learning framework to explore the rich information from saliency maps.
We also propose a cross-task affinity learning mechanism to learn pixel-level affinities from the saliency and segmentation feature maps.
arXiv Detail & Related papers (2024-03-02T10:03:21Z) - Question-Answer Cross Language Image Matching for Weakly Supervised
Semantic Segmentation [37.15828464616587]
Class Activation Map (CAM) has emerged as a popular tool for weakly supervised semantic segmentation.
We propose a novel Question-Answer Cross-Language-Image Matching framework for WSSS (QA-CLIMS)
arXiv Detail & Related papers (2024-01-18T10:55:13Z) - CLIP Is Also a Good Teacher: A New Learning Framework for Inductive
Zero-shot Semantic Segmentation [6.181169909576527]
Generalized Zero-shot Semantic aims to segment both seen and unseen categories only under the supervision of the seen ones.
Existing methods adopt the large-scale Vision Language Models (VLMs) which obtain outstanding zero-shot performance.
We propose CLIP-ZSS (Zero-shot Semantic), a training framework that enables any image encoder designed for closed-set segmentation applied in zero-shot and open-vocabulary tasks.
arXiv Detail & Related papers (2023-10-03T09:33:47Z) - Location-Aware Self-Supervised Transformers [74.76585889813207]
We propose to pretrain networks for semantic segmentation by predicting the relative location of image parts.
We control the difficulty of the task by masking a subset of the reference patch features visible to those of the query.
Our experiments show that this location-aware pretraining leads to representations that transfer competitively to several challenging semantic segmentation benchmarks.
arXiv Detail & Related papers (2022-12-05T16:24:29Z) - SLAM: Semantic Learning based Activation Map for Weakly Supervised
Semantic Segmentation [34.996841532954925]
We propose a novel semantic learning based framework for WSSS, named SLAM (Semantic Learning based Activation Map)
We firstly design a semantic encoder to learn semantics of each object category and extract category-specific semantic embeddings from an input image.
Four loss functions, i.e., category-foreground, category-background, activation regularization, and consistency loss are proposed to ensure the correctness, completeness, compactness and consistency of the activation map.
arXiv Detail & Related papers (2022-10-22T11:17:30Z) - Weakly-supervised segmentation of referring expressions [81.73850439141374]
Text grounded semantic SEGmentation learns segmentation masks directly from image-level referring expressions without pixel-level annotations.
Our approach demonstrates promising results for weakly-supervised referring expression segmentation on the PhraseCut and RefCOCO datasets.
arXiv Detail & Related papers (2022-05-10T07:52:24Z) - Leveraging Auxiliary Tasks with Affinity Learning for Weakly Supervised
Semantic Segmentation [88.49669148290306]
We propose a novel weakly supervised multi-task framework called AuxSegNet to leverage saliency detection and multi-label image classification as auxiliary tasks.
Inspired by their similar structured semantics, we also propose to learn a cross-task global pixel-level affinity map from the saliency and segmentation representations.
The learned cross-task affinity can be used to refine saliency predictions and propagate CAM maps to provide improved pseudo labels for both tasks.
arXiv Detail & Related papers (2021-07-25T11:39:58Z) - Causal Intervention for Weakly-Supervised Semantic Segmentation [122.1846968696862]
We aim to generate better pixel-level pseudo-masks by using only image-level labels.
We propose a structural causal model to analyze the causalities among images, contexts, and class labels.
Based on it, we develop a new method: Context Adjustment (CONTA), to remove the confounding bias in image-level classification.
arXiv Detail & Related papers (2020-09-26T09:26:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.