Related papers: Auto-Vocabulary Semantic Segmentation

Auto-Vocabulary Semantic Segmentation

URL: http://arxiv.org/abs/2312.04539v2
Date: Wed, 20 Mar 2024 16:11:22 GMT
Title: Auto-Vocabulary Semantic Segmentation
Authors: Osman Ülger, Maksymilian Kulicki, Yuki Asano, Martin R. Oswald,
Abstract summary: We introduce textitAuto-Vocabulary Semantics (AVS), advancing open-ended image understanding. Our framework autonomously identifies relevant class names using enhanced BLIP embedding. Our method sets new benchmarks on datasets such as PASCAL VOC and Context, ADE20K, and Cityscapes for AVS.
Score: 13.410217680999462
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Open-ended image understanding tasks gained significant attention from the research community, particularly with the emergence of Vision-Language Models. Open-Vocabulary Segmentation (OVS) methods are capable of performing semantic segmentation without relying on a fixed vocabulary, and in some cases, they operate without the need for training or fine-tuning. However, OVS methods typically require users to specify the vocabulary based on the task or dataset at hand. In this paper, we introduce \textit{Auto-Vocabulary Semantic Segmentation (AVS)}, advancing open-ended image understanding by eliminating the necessity to predefine object categories for segmentation. Our approach, \ours, presents a framework that autonomously identifies relevant class names using enhanced BLIP embeddings, which are utilized for segmentation afterwards. Given that open-ended object category predictions cannot be directly compared with a fixed ground truth, we develop a Large Language Model-based Auto-Vocabulary Evaluator (LAVE) to efficiently evaluate the automatically generated class names and their corresponding segments. Our method sets new benchmarks on datasets such as PASCAL VOC and Context, ADE20K, and Cityscapes for AVS and showcases competitive performance to OVS methods that require specified class names.

Related papers

From Open-Vocabulary to Vocabulary-Free Semantic Segmentation [78.62232202171919]
Open-vocabulary semantic segmentation enables models to identify novel object categories beyond their training data. Current approaches still rely on manually specified class names as input, creating an inherent bottleneck in real-world applications. This work proposes a Vocabulary-Free Semantic pipeline, eliminating the need for predefined class vocabularies.
arXiv Detail & Related papers (2025-02-17T15:17:08Z)
Lidar Panoptic Segmentation in an Open World [50.094491113541046]
Lidar Panoptics (LPS) is crucial for safe deployment of autonomous vehicles. LPS aims to recognize and segment lidar points wr.t. a pre-defined vocabulary of semantic classes. We propose a class-agnostic point clustering and over-segment the input cloud in a hierarchical fashion, followed by binary point segment classification.
arXiv Detail & Related papers (2024-09-22T00:10:20Z)
Vocabulary-free Image Classification and Semantic Segmentation [71.78089106671581]
We introduce the Vocabulary-free Image Classification (VIC) task, which aims to assign a class from an un-constrained language-induced semantic space to an input image without needing a known vocabulary. VIC is challenging due to the vastness of the semantic space, which contains millions of concepts, including fine-grained categories. We propose Category Search from External Databases (CaSED), a training-free method that leverages a pre-trained vision-language model and an external database.
arXiv Detail & Related papers (2024-04-16T19:27:21Z)
Training-Free Semantic Segmentation via LLM-Supervision [37.9007813884699]
This paper introduces a new approach to text-supervised semantic segmentation using supervision by a large language model (LLM) Our method starts from an LLM to generate a detailed set of subclasses for more accurate class representation. We then employ an advanced text-supervised semantic segmentation model to apply the generated subclasses as target labels.
arXiv Detail & Related papers (2024-03-31T14:37:25Z)
TAG: Guidance-free Open-Vocabulary Semantic Segmentation [6.236890292833387]
We propose TAG, which achieves Training,. and Guidance-free open-vocabulary segmentation. It retrieves class labels from an external database, providing flexibility to adapt to new scenarios. Our TAG achieves state-of-the-art results on PascalVOC, PascalContext and ADE20K for open-vocabulary segmentation without given class names.
arXiv Detail & Related papers (2024-03-17T12:49:02Z)
Semantic Prompt Learning for Weakly-Supervised Semantic Segmentation [33.336549577936196]
Weakly-Supervised Semantic (WSSS) aims to train segmentation models using image data with only image-level supervision. We propose a Semantic Prompt Learning for WSSS (SemPLeS) framework, which learns to effectively prompt the CLIP latent space. SemPLeS can perform better semantic alignment between object regions and class labels, resulting in desired pseudo masks for training segmentation models.
arXiv Detail & Related papers (2024-01-22T09:41:05Z)
Open-Vocabulary Segmentation with Semantic-Assisted Calibration [68.41025728960176]
We study open-vocabulary segmentation (OVS) through calibrating in-vocabulary and domain-biased embedding space with contextual prior of CLIP. We present a Semantic-assisted CAlibration Network (SCAN) to achieve state-of-the-art performance on open-vocabulary segmentation benchmarks.
arXiv Detail & Related papers (2023-12-07T07:00:09Z)
Open-Vocabulary Camouflaged Object Segmentation [66.94945066779988]
We introduce a new task, open-vocabulary camouflaged object segmentation (OVCOS) We construct a large-scale complex scene dataset (textbfOVCamo) containing 11,483 hand-selected images with fine annotations and corresponding object classes. By integrating the guidance of class semantic knowledge and the supplement of visual structure cues from the edge and depth information, the proposed method can efficiently capture camouflaged objects.
arXiv Detail & Related papers (2023-11-19T06:00:39Z)
Diffusion Models for Open-Vocabulary Segmentation [79.02153797465324]
OVDiff is a novel method that leverages generative text-to-image diffusion models for unsupervised open-vocabulary segmentation. It relies solely on pre-trained components and outputs the synthesised segmenter directly, without training.
arXiv Detail & Related papers (2023-06-15T17:51:28Z)
Exploring Open-Vocabulary Semantic Segmentation without Human Labels [76.15862573035565]
We present ZeroSeg, a novel method that leverages the existing pretrained vision-language model (VL) to train semantic segmentation models. ZeroSeg overcomes this by distilling the visual concepts learned by VL models into a set of segment tokens, each summarizing a localized region of the target image. Our approach achieves state-of-the-art performance when compared to other zero-shot segmentation methods under the same training data.
arXiv Detail & Related papers (2023-06-01T08:47:06Z)
Advancing Incremental Few-shot Semantic Segmentation via Semantic-guided Relation Alignment and Adaptation [98.51938442785179]
Incremental few-shot semantic segmentation aims to incrementally extend a semantic segmentation model to novel classes. This task faces a severe semantic-aliasing issue between base and novel classes due to data imbalance. We propose the Semantic-guided Relation Alignment and Adaptation (SRAA) method that fully considers the guidance of prior semantic information.
arXiv Detail & Related papers (2023-05-18T10:40:52Z)
What's in a Name? Beyond Class Indices for Image Recognition [28.02490526407716]
We propose a vision-language model with assigning class names to images given only a large (essentially unconstrained) vocabulary of categories as prior information. We leverage non-parametric methods to establish meaningful relationships between images, allowing the model to automatically narrow down the pool of candidate names. Our method leads to a roughly 50% improvement over the baseline on ImageNet in the unsupervised setting.
arXiv Detail & Related papers (2023-04-05T11:01:23Z)
Global Knowledge Calibration for Fast Open-Vocabulary Segmentation [124.74256749281625]
We introduce a text diversification strategy that generates a set of synonyms for each training category. We also employ a text-guided knowledge distillation method to preserve the generalizable knowledge of CLIP. Our proposed model achieves robust generalization performance across various datasets.
arXiv Detail & Related papers (2023-03-16T09:51:41Z)
Open-world Semantic Segmentation via Contrasting and Clustering Vision-Language Embedding [95.78002228538841]
We propose a new open-world semantic segmentation pipeline that makes the first attempt to learn to segment semantic objects of various open-world categories without any efforts on dense annotations. Our method can directly segment objects of arbitrary categories, outperforming zero-shot segmentation methods that require data labeling on three benchmark datasets.
arXiv Detail & Related papers (2022-07-18T09:20:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.