Auto-Vocabulary Semantic Segmentation
- URL: http://arxiv.org/abs/2312.04539v2
- Date: Wed, 20 Mar 2024 16:11:22 GMT
- Title: Auto-Vocabulary Semantic Segmentation
- Authors: Osman Ülger, Maksymilian Kulicki, Yuki Asano, Martin R. Oswald,
- Abstract summary: We introduce textitAuto-Vocabulary Semantics (AVS), advancing open-ended image understanding.
Our framework autonomously identifies relevant class names using enhanced BLIP embedding.
Our method sets new benchmarks on datasets such as PASCAL VOC and Context, ADE20K, and Cityscapes for AVS.
- Score: 13.410217680999462
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Open-ended image understanding tasks gained significant attention from the research community, particularly with the emergence of Vision-Language Models. Open-Vocabulary Segmentation (OVS) methods are capable of performing semantic segmentation without relying on a fixed vocabulary, and in some cases, they operate without the need for training or fine-tuning. However, OVS methods typically require users to specify the vocabulary based on the task or dataset at hand. In this paper, we introduce \textit{Auto-Vocabulary Semantic Segmentation (AVS)}, advancing open-ended image understanding by eliminating the necessity to predefine object categories for segmentation. Our approach, \ours, presents a framework that autonomously identifies relevant class names using enhanced BLIP embeddings, which are utilized for segmentation afterwards. Given that open-ended object category predictions cannot be directly compared with a fixed ground truth, we develop a Large Language Model-based Auto-Vocabulary Evaluator (LAVE) to efficiently evaluate the automatically generated class names and their corresponding segments. Our method sets new benchmarks on datasets such as PASCAL VOC and Context, ADE20K, and Cityscapes for AVS and showcases competitive performance to OVS methods that require specified class names.
Related papers
- From Open-Vocabulary to Vocabulary-Free Semantic Segmentation [78.62232202171919]
Open-vocabulary semantic segmentation enables models to identify novel object categories beyond their training data.
Current approaches still rely on manually specified class names as input, creating an inherent bottleneck in real-world applications.
This work proposes a Vocabulary-Free Semantic pipeline, eliminating the need for predefined class vocabularies.
arXiv Detail & Related papers (2025-02-17T15:17:08Z) - LMSeg: Unleashing the Power of Large-Scale Models for Open-Vocabulary Semantic Segmentation [16.864086165056698]
Existing open-vocabulary approaches leverage vision-language models, such as CLIP, to align visual features with rich semantic features acquired through pre-training on large-scale vision-language datasets.
We propose to alleviate the issues by leveraging multiple large-scale models to enhance the alignment between fine-grained visual features and enriched linguistic features.
Our method achieves state-of-the-art performance across all major open-vocabulary segmentation benchmarks.
arXiv Detail & Related papers (2024-11-30T05:49:42Z) - Vocabulary-free Image Classification and Semantic Segmentation [71.78089106671581]
We introduce the Vocabulary-free Image Classification (VIC) task, which aims to assign a class from an un-constrained language-induced semantic space to an input image without needing a known vocabulary.
VIC is challenging due to the vastness of the semantic space, which contains millions of concepts, including fine-grained categories.
We propose Category Search from External Databases (CaSED), a training-free method that leverages a pre-trained vision-language model and an external database.
arXiv Detail & Related papers (2024-04-16T19:27:21Z) - TAG: Guidance-free Open-Vocabulary Semantic Segmentation [6.236890292833387]
We propose TAG, which achieves Training,.
and Guidance-free open-vocabulary segmentation.
It retrieves class labels from an external database, providing flexibility to adapt to new scenarios.
Our TAG achieves state-of-the-art results on PascalVOC, PascalContext and ADE20K for open-vocabulary segmentation without given class names.
arXiv Detail & Related papers (2024-03-17T12:49:02Z) - Open-Vocabulary Camouflaged Object Segmentation [66.94945066779988]
We introduce a new task, open-vocabulary camouflaged object segmentation (OVCOS)
We construct a large-scale complex scene dataset (textbfOVCamo) containing 11,483 hand-selected images with fine annotations and corresponding object classes.
By integrating the guidance of class semantic knowledge and the supplement of visual structure cues from the edge and depth information, the proposed method can efficiently capture camouflaged objects.
arXiv Detail & Related papers (2023-11-19T06:00:39Z) - Diffusion Models for Open-Vocabulary Segmentation [79.02153797465324]
OVDiff is a novel method that leverages generative text-to-image diffusion models for unsupervised open-vocabulary segmentation.
It relies solely on pre-trained components and outputs the synthesised segmenter directly, without training.
arXiv Detail & Related papers (2023-06-15T17:51:28Z) - What's in a Name? Beyond Class Indices for Image Recognition [28.02490526407716]
We propose a vision-language model with assigning class names to images given only a large (essentially unconstrained) vocabulary of categories as prior information.
We leverage non-parametric methods to establish meaningful relationships between images, allowing the model to automatically narrow down the pool of candidate names.
Our method leads to a roughly 50% improvement over the baseline on ImageNet in the unsupervised setting.
arXiv Detail & Related papers (2023-04-05T11:01:23Z) - Global Knowledge Calibration for Fast Open-Vocabulary Segmentation [124.74256749281625]
We introduce a text diversification strategy that generates a set of synonyms for each training category.
We also employ a text-guided knowledge distillation method to preserve the generalizable knowledge of CLIP.
Our proposed model achieves robust generalization performance across various datasets.
arXiv Detail & Related papers (2023-03-16T09:51:41Z) - Open-world Semantic Segmentation via Contrasting and Clustering
Vision-Language Embedding [95.78002228538841]
We propose a new open-world semantic segmentation pipeline that makes the first attempt to learn to segment semantic objects of various open-world categories without any efforts on dense annotations.
Our method can directly segment objects of arbitrary categories, outperforming zero-shot segmentation methods that require data labeling on three benchmark datasets.
arXiv Detail & Related papers (2022-07-18T09:20:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.