Related papers: Training-Free Semantic Segmentation via LLM-Supervision

Training-Free Semantic Segmentation via LLM-Supervision

URL: http://arxiv.org/abs/2404.00701v1
Date: Sun, 31 Mar 2024 14:37:25 GMT
Title: Training-Free Semantic Segmentation via LLM-Supervision
Authors: Wenfang Sun, Yingjun Du, Gaowen Liu, Ramana Kompella, Cees G. M. Snoek,
Abstract summary: This paper introduces a new approach to text-supervised semantic segmentation using supervision by a large language model (LLM) Our method starts from an LLM to generate a detailed set of subclasses for more accurate class representation. We then employ an advanced text-supervised semantic segmentation model to apply the generated subclasses as target labels.
Score: 37.9007813884699
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advancements in open vocabulary models, like CLIP, have notably advanced zero-shot classification and segmentation by utilizing natural language for class-specific embeddings. However, most research has focused on improving model accuracy through prompt engineering, prompt learning, or fine-tuning with limited labeled data, thereby overlooking the importance of refining the class descriptors. This paper introduces a new approach to text-supervised semantic segmentation using supervision by a large language model (LLM) that does not require extra training. Our method starts from an LLM, like GPT-3, to generate a detailed set of subclasses for more accurate class representation. We then employ an advanced text-supervised semantic segmentation model to apply the generated subclasses as target labels, resulting in diverse segmentation results tailored to each subclass's unique characteristics. Additionally, we propose an assembly that merges the segmentation maps from the various subclass descriptors to ensure a more comprehensive representation of the different aspects in the test images. Through comprehensive experiments on three standard benchmarks, our method outperforms traditional text-supervised semantic segmentation methods by a marked margin.

Related papers

Cross-Domain Semantic Segmentation with Large Language Model-Assisted Descriptor Generation [0.0]
LangSeg is a novel semantic segmentation method that leverages context-sensitive, fine-grained subclass descriptors. We evaluate LangSeg on two challenging datasets, ADE20K and COCO-Stuff, where it outperforms state-of-the-art models.
arXiv Detail & Related papers (2025-01-27T20:02:12Z)
Learning Semantic-Aware Representation in Visual-Language Models for Multi-Label Recognition with Partial Labels [19.740929527669483]
Multi-label recognition with partial labels (MLR-PL) is a practical task in computer vision. We introduce a semantic decoupling module and a category-specific prompt optimization method in CLIP-based framework. Our method effectively separates information from different categories and achieves better performance compared to CLIP-based baseline method.
arXiv Detail & Related papers (2024-12-14T14:31:36Z)
LMSeg: Unleashing the Power of Large-Scale Models for Open-Vocabulary Semantic Segmentation [16.864086165056698]
Existing open-vocabulary approaches leverage vision-language models, such as CLIP, to align visual features with rich semantic features acquired through pre-training on large-scale vision-language datasets. We propose to alleviate the issues by leveraging multiple large-scale models to enhance the alignment between fine-grained visual features and enriched linguistic features. Our method achieves state-of-the-art performance across all major open-vocabulary segmentation benchmarks.
arXiv Detail & Related papers (2024-11-30T05:49:42Z)
Visual Prompt Selection for In-Context Learning Segmentation [77.15684360470152]
In this paper, we focus on rethinking and improving the example selection strategy. We first demonstrate that ICL-based segmentation models are sensitive to different contexts. Furthermore, empirical evidence indicates that the diversity of contextual prompts plays a crucial role in guiding segmentation.
arXiv Detail & Related papers (2024-07-14T15:02:54Z)
CLIP Is Also a Good Teacher: A New Learning Framework for Inductive Zero-shot Semantic Segmentation [6.181169909576527]
Generalized Zero-shot Semantic aims to segment both seen and unseen categories only under the supervision of the seen ones. Existing methods adopt the large-scale Vision Language Models (VLMs) which obtain outstanding zero-shot performance. We propose CLIP-ZSS (Zero-shot Semantic), a training framework that enables any image encoder designed for closed-set segmentation applied in zero-shot and open-vocabulary tasks.
arXiv Detail & Related papers (2023-10-03T09:33:47Z)
Towards Realistic Zero-Shot Classification via Self Structural Semantic Alignment [53.2701026843921]
Large-scale pre-trained Vision Language Models (VLMs) have proven effective for zero-shot classification. In this paper, we aim at a more challenging setting, Realistic Zero-Shot Classification, which assumes no annotation but instead a broad vocabulary. We propose the Self Structural Semantic Alignment (S3A) framework, which extracts structural semantic information from unlabeled data while simultaneously self-learning.
arXiv Detail & Related papers (2023-08-24T17:56:46Z)
Boosting Semantic Segmentation from the Perspective of Explicit Class Embeddings [19.997929884477628]
We explore the mechanism of class embeddings and have an insight that more explicit and meaningful class embeddings can be generated based on class masks purposely. We propose ECENet, a new segmentation paradigm, in which class embeddings are obtained and enhanced explicitly during interacting with multi-stage image features. Our ECENet outperforms its counterparts on the ADE20K dataset with much less computational cost and achieves new state-of-the-art results on PASCAL-Context dataset.
arXiv Detail & Related papers (2023-08-24T16:16:10Z)
Weakly Supervised 3D Point Cloud Segmentation via Multi-Prototype Learning [37.76664203157892]
A fundamental challenge here lies in the large intra-class variations of local geometric structure, resulting in subclasses within a semantic class. We leverage this intuition and opt for maintaining an individual classifier for each subclass. Our hypothesis is also verified given the consistent discovery of semantic subclasses at no cost of additional annotations.
arXiv Detail & Related papers (2022-05-06T11:07:36Z)
Scaling up Multi-domain Semantic Segmentation with Sentence Embeddings [81.09026586111811]
We propose an approach to semantic segmentation that achieves state-of-the-art supervised performance when applied in a zero-shot setting. This is achieved by replacing each class label with a vector-valued embedding of a short paragraph that describes the class. The resulting merged semantic segmentation dataset of over 2 Million images enables training a model that achieves performance equal to that of state-of-the-art supervised methods on 7 benchmark datasets.
arXiv Detail & Related papers (2022-02-04T07:19:09Z)
TransFGU: A Top-down Approach to Fine-Grained Unsupervised Semantic Segmentation [44.75300205362518]
Unsupervised semantic segmentation aims to obtain high-level semantic representation on low-level visual features without manual annotations. We propose the first top-down unsupervised semantic segmentation framework for fine-grained segmentation in extremely complicated scenarios. Our results show that our top-down unsupervised segmentation is robust to both object-centric and scene-centric datasets.
arXiv Detail & Related papers (2021-12-02T18:59:03Z)
Part-aware Prototype Network for Few-shot Semantic Segmentation [50.581647306020095]
We propose a novel few-shot semantic segmentation framework based on the prototype representation. Our key idea is to decompose the holistic class representation into a set of part-aware prototypes. We develop a novel graph neural network model to generate and enhance the proposed part-aware prototypes.
arXiv Detail & Related papers (2020-07-13T11:03:09Z)
UniT: Unified Knowledge Transfer for Any-shot Object Detection and Segmentation [52.487469544343305]
Methods for object detection and segmentation rely on large scale instance-level annotations for training. We propose an intuitive and unified semi-supervised model that is applicable to a range of supervision.
arXiv Detail & Related papers (2020-06-12T22:45:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.