A Semantic Space is Worth 256 Language Descriptions: Make Stronger
Segmentation Models with Descriptive Properties
- URL: http://arxiv.org/abs/2312.13764v1
- Date: Thu, 21 Dec 2023 11:43:41 GMT
- Title: A Semantic Space is Worth 256 Language Descriptions: Make Stronger
Segmentation Models with Descriptive Properties
- Authors: Junfei Xiao, Ziqi Zhou, Wenxuan Li, Shiyi Lan, Jieru Mei, Zhiding Yu,
Alan Yuille, Yuyin Zhou, Cihang Xie
- Abstract summary: ProLab is a novel approach using property-level label space for creating strong interpretable segmentation models.
It uses descriptive properties grounded in common sense knowledge for supervising segmentation models.
- Score: 55.17733971660755
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces ProLab, a novel approach using property-level label
space for creating strong interpretable segmentation models. Instead of relying
solely on category-specific annotations, ProLab uses descriptive properties
grounded in common sense knowledge for supervising segmentation models. It is
based on two core designs. First, we employ Large Language Models (LLMs) and
carefully crafted prompts to generate descriptions of all involved categories
that carry meaningful common sense knowledge and follow a structured format.
Second, we introduce a description embedding model preserving semantic
correlation across descriptions and then cluster them into a set of descriptive
properties (e.g., 256) using K-Means. These properties are based on
interpretable common sense knowledge consistent with theories of human
recognition. We empirically show that our approach makes segmentation models
perform stronger on five classic benchmarks (e.g., ADE20K, COCO-Stuff, Pascal
Context, Cityscapes, and BDD). Our method also shows better scalability with
extended training steps than category-level supervision. Our interpretable
segmentation framework also emerges with the generalization ability to segment
out-of-domain or unknown categories using only in-domain descriptive
properties. Code is available at https://github.com/lambert-x/ProLab.
Related papers
- Training-Free Semantic Segmentation via LLM-Supervision [37.9007813884699]
This paper introduces a new approach to text-supervised semantic segmentation using supervision by a large language model (LLM)
Our method starts from an LLM to generate a detailed set of subclasses for more accurate class representation.
We then employ an advanced text-supervised semantic segmentation model to apply the generated subclasses as target labels.
arXiv Detail & Related papers (2024-03-31T14:37:25Z) - From Text Segmentation to Smart Chaptering: A Novel Benchmark for
Structuring Video Transcriptions [63.11097464396147]
We introduce a novel benchmark YTSeg focusing on spoken content that is inherently more unstructured and both topically and structurally diverse.
We also introduce an efficient hierarchical segmentation model MiniSeg, that outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2024-02-27T15:59:37Z) - SAMBA: A Trainable Segmentation Web-App with Smart Labelling [0.0]
SAMBA is a trainable segmentation tool that uses Meta's Segment Anything Model (SAM) for fast, high-quality label suggestions.
The segmentation backend is run in the cloud, so does not require the user to have powerful hardware.
arXiv Detail & Related papers (2023-12-07T10:31:05Z) - AttrSeg: Open-Vocabulary Semantic Segmentation via Attribute
Decomposition-Aggregation [33.25304533086283]
Open-vocabulary semantic segmentation is a challenging task that requires segmenting novel object categories at inference time.
Recent studies have explored vision-language pre-training to handle this task, but suffer from unrealistic assumptions in practical scenarios.
This work proposes a novel attribute decomposition-aggregation framework, AttrSeg, inspired by human cognition in understanding new concepts.
arXiv Detail & Related papers (2023-08-31T19:34:09Z) - ISIM: Iterative Self-Improved Model for Weakly Supervised Segmentation [0.34265828682659694]
Weakly Supervised Semantic Conditional (WSSS) is a challenging task aiming to learn the segmentation labels from class-level labels.
We propose a framework that employs an iterative approach in a modified encoder-decoder-based segmentation model.
Experiments performed with DeepLabv3 and UNet models show a significant gain on the Pascal VOC12 dataset.
arXiv Detail & Related papers (2022-11-22T18:14:06Z) - Scaling up Multi-domain Semantic Segmentation with Sentence Embeddings [81.09026586111811]
We propose an approach to semantic segmentation that achieves state-of-the-art supervised performance when applied in a zero-shot setting.
This is achieved by replacing each class label with a vector-valued embedding of a short paragraph that describes the class.
The resulting merged semantic segmentation dataset of over 2 Million images enables training a model that achieves performance equal to that of state-of-the-art supervised methods on 7 benchmark datasets.
arXiv Detail & Related papers (2022-02-04T07:19:09Z) - TransFGU: A Top-down Approach to Fine-Grained Unsupervised Semantic
Segmentation [44.75300205362518]
Unsupervised semantic segmentation aims to obtain high-level semantic representation on low-level visual features without manual annotations.
We propose the first top-down unsupervised semantic segmentation framework for fine-grained segmentation in extremely complicated scenarios.
Our results show that our top-down unsupervised segmentation is robust to both object-centric and scene-centric datasets.
arXiv Detail & Related papers (2021-12-02T18:59:03Z) - APANet: Adaptive Prototypes Alignment Network for Few-Shot Semantic
Segmentation [56.387647750094466]
Few-shot semantic segmentation aims to segment novel-class objects in a given query image with only a few labeled support images.
Most advanced solutions exploit a metric learning framework that performs segmentation through matching each query feature to a learned class-specific prototype.
We present an adaptive prototype representation by introducing class-specific and class-agnostic prototypes.
arXiv Detail & Related papers (2021-11-24T04:38:37Z) - Exploring the Hierarchy in Relation Labels for Scene Graph Generation [75.88758055269948]
The proposed method can improve several state-of-the-art baselines by a large margin (up to $33%$ relative gain) in terms of Recall@50.
Experiments show that the proposed simple yet effective method can improve several state-of-the-art baselines by a large margin.
arXiv Detail & Related papers (2020-09-12T17:36:53Z) - Commonality-Parsing Network across Shape and Appearance for Partially
Supervised Instance Segmentation [71.59275788106622]
We propose to learn the underlying class-agnostic commonalities that can be generalized from mask-annotated categories to novel categories.
Our model significantly outperforms the state-of-the-art methods on both partially supervised setting and few-shot setting for instance segmentation on COCO dataset.
arXiv Detail & Related papers (2020-07-24T07:23:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.