Weakly-Supervised Semantic Segmentation with Image-Level Labels: from
Traditional Models to Foundation Models
- URL: http://arxiv.org/abs/2310.13026v1
- Date: Thu, 19 Oct 2023 07:16:54 GMT
- Title: Weakly-Supervised Semantic Segmentation with Image-Level Labels: from
Traditional Models to Foundation Models
- Authors: Zhaozheng Chen and Qianru Sun
- Abstract summary: Weakly-supervised semantic segmentation (WSSS) is an effective solution to avoid pixel-level labels.
We focus on the WSSS with image-level labels, which is the most challenging form of WSSS.
We investigate the applicability of visual foundation models, such as the Segment Anything Model (SAM), in the context of WSSS.
- Score: 33.690846523358836
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rapid development of deep learning has driven significant progress in the
field of image semantic segmentation - a fundamental task in computer vision.
Semantic segmentation algorithms often depend on the availability of
pixel-level labels (i.e., masks of objects), which are expensive,
time-consuming, and labor-intensive. Weakly-supervised semantic segmentation
(WSSS) is an effective solution to avoid such labeling. It utilizes only
partial or incomplete annotations and provides a cost-effective alternative to
fully-supervised semantic segmentation. In this paper, we focus on the WSSS
with image-level labels, which is the most challenging form of WSSS. Our work
has two parts. First, we conduct a comprehensive survey on traditional methods,
primarily focusing on those presented at premier research conferences. We
categorize them into four groups based on where their methods operate:
pixel-wise, image-wise, cross-image, and external data. Second, we investigate
the applicability of visual foundation models, such as the Segment Anything
Model (SAM), in the context of WSSS. We scrutinize SAM in two intriguing
scenarios: text prompting and zero-shot learning. We provide insights into the
potential and challenges associated with deploying visual foundational models
for WSSS, facilitating future developments in this exciting research area.
Related papers
- Image Segmentation in Foundation Model Era: A Survey [99.19456390358211]
Current research in image segmentation lacks a detailed analysis of distinct characteristics, challenges, and solutions associated with these advancements.
This survey seeks to fill this gap by providing a thorough review of cutting-edge research centered around FM-driven image segmentation.
An exhaustive overview of over 300 segmentation approaches is provided to encapsulate the breadth of current research efforts.
arXiv Detail & Related papers (2024-08-23T10:07:59Z) - Semi-Supervised Semantic Segmentation Based on Pseudo-Labels: A Survey [49.47197748663787]
This review aims to provide a first comprehensive and organized overview of the state-of-the-art research results on pseudo-label methods in the field of semi-supervised semantic segmentation.
In addition, we explore the application of pseudo-label technology in medical and remote-sensing image segmentation.
arXiv Detail & Related papers (2024-03-04T10:18:38Z) - Exploring Open-Vocabulary Semantic Segmentation without Human Labels [76.15862573035565]
We present ZeroSeg, a novel method that leverages the existing pretrained vision-language model (VL) to train semantic segmentation models.
ZeroSeg overcomes this by distilling the visual concepts learned by VL models into a set of segment tokens, each summarizing a localized region of the target image.
Our approach achieves state-of-the-art performance when compared to other zero-shot segmentation methods under the same training data.
arXiv Detail & Related papers (2023-06-01T08:47:06Z) - Semantic Image Segmentation: Two Decades of Research [22.533249554532322]
This book is an effort to summarize two decades of research in the field of semantic image segmentation (SiS)
We propose a review of solutions starting from early historical methods followed by an overview of more recent deep learning methods including the latest trend of using transformers.
We unveil newer trends such as multi-domain learning, domain generalization, domain incremental learning, test-time adaptation and source-free domain adaptation.
arXiv Detail & Related papers (2023-02-13T14:11:05Z) - A Pixel-Level Meta-Learner for Weakly Supervised Few-Shot Semantic
Segmentation [40.27705176115985]
Few-shot semantic segmentation addresses the learning task in which only few images with ground truth pixel-level labels are available for the novel classes of interest.
We propose a novel meta-learning framework, which predicts pseudo pixel-level segmentation masks from a limited amount of data and their semantic labels.
Our proposed learning model can be viewed as a pixel-level meta-learner.
arXiv Detail & Related papers (2021-11-02T08:28:11Z) - Semantic Segmentation with Generative Models: Semi-Supervised Learning
and Strong Out-of-Domain Generalization [112.68171734288237]
We propose a novel framework for discriminative pixel-level tasks using a generative model of both images and labels.
We learn a generative adversarial network that captures the joint image-label distribution and is trained efficiently using a large set of unlabeled images.
We demonstrate strong in-domain performance compared to several baselines, and are the first to showcase extreme out-of-domain generalization.
arXiv Detail & Related papers (2021-04-12T21:41:25Z) - Rethinking of the Image Salient Object Detection: Object-level Semantic
Saliency Re-ranking First, Pixel-wise Saliency Refinement Latter [62.26677215668959]
We propose a lightweight, weakly supervised deep network to coarsely locate semantically salient regions.
We then fuse multiple off-the-shelf deep models on these semantically salient regions as the pixel-wise saliency refinement.
Our method is simple yet effective, which is the first attempt to consider the salient object detection mainly as an object-level semantic re-ranking problem.
arXiv Detail & Related papers (2020-08-10T07:12:43Z) - Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation [128.03739769844736]
Two neural co-attentions are incorporated into the classifier to capture cross-image semantic similarities and differences.
In addition to boosting object pattern learning, the co-attention can leverage context from other related images to improve localization map inference.
Our algorithm sets new state-of-the-arts on all these settings, demonstrating well its efficacy and generalizability.
arXiv Detail & Related papers (2020-07-03T21:53:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.