AttrSeg: Open-Vocabulary Semantic Segmentation via Attribute
Decomposition-Aggregation
- URL: http://arxiv.org/abs/2309.00096v2
- Date: Sat, 6 Jan 2024 04:10:27 GMT
- Title: AttrSeg: Open-Vocabulary Semantic Segmentation via Attribute
Decomposition-Aggregation
- Authors: Chaofan Ma, Yuhuan Yang, Chen Ju, Fei Zhang, Ya Zhang, Yanfeng Wang
- Abstract summary: Open-vocabulary semantic segmentation is a challenging task that requires segmenting novel object categories at inference time.
Recent studies have explored vision-language pre-training to handle this task, but suffer from unrealistic assumptions in practical scenarios.
This work proposes a novel attribute decomposition-aggregation framework, AttrSeg, inspired by human cognition in understanding new concepts.
- Score: 33.25304533086283
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Open-vocabulary semantic segmentation is a challenging task that requires
segmenting novel object categories at inference time. Recent studies have
explored vision-language pre-training to handle this task, but suffer from
unrealistic assumptions in practical scenarios, i.e., low-quality textual
category names. For example, this paradigm assumes that new textual categories
will be accurately and completely provided, and exist in lexicons during
pre-training. However, exceptions often happen when encountering ambiguity for
brief or incomplete names, new words that are not present in the pre-trained
lexicons, and difficult-to-describe categories for users. To address these
issues, this work proposes a novel attribute decomposition-aggregation
framework, AttrSeg, inspired by human cognition in understanding new concepts.
Specifically, in the decomposition stage, we decouple class names into diverse
attribute descriptions to complement semantic contexts from multiple
perspectives. Two attribute construction strategies are designed: using large
language models for common categories, and involving manually labeling for
human-invented categories. In the aggregation stage, we group diverse
attributes into an integrated global description, to form a discriminative
classifier that distinguishes the target object from others. One hierarchical
aggregation architecture is further proposed to achieve multi-level
aggregations, leveraging the meticulously designed clustering module. The final
results are obtained by computing the similarity between aggregated attributes
and images embeddings. To evaluate the effectiveness, we annotate three types
of datasets with attribute descriptions, and conduct extensive experiments and
ablation studies. The results show the superior performance of attribute
decomposition-aggregation.
Related papers
- Label-Guided Prompt for Multi-label Few-shot Aspect Category Detection [12.094529796168384]
The representation of sentences and categories is a key issue in this task.
We propose a label-guided prompt method to represent sentences and categories.
Our method outperforms current state-of-the-art methods with a 3.86% - 4.75% improvement in the Macro-F1 score.
arXiv Detail & Related papers (2024-07-30T09:11:17Z) - Primitive Generation and Semantic-related Alignment for Universal
Zero-Shot Segmentation [13.001629605405954]
We study universal zero-shot segmentation in this work to achieve panoptic, instance, and semantic segmentation for novel categories without any training samples.
We introduce a generative model to synthesize features for unseen categories, which links semantic and visual spaces.
The proposed approach achieves impressively state-of-the-art performance on zero-shot panoptic segmentation, instance segmentation, and semantic segmentation.
arXiv Detail & Related papers (2023-06-19T17:59:16Z) - Advancing Incremental Few-shot Semantic Segmentation via Semantic-guided
Relation Alignment and Adaptation [98.51938442785179]
Incremental few-shot semantic segmentation aims to incrementally extend a semantic segmentation model to novel classes.
This task faces a severe semantic-aliasing issue between base and novel classes due to data imbalance.
We propose the Semantic-guided Relation Alignment and Adaptation (SRAA) method that fully considers the guidance of prior semantic information.
arXiv Detail & Related papers (2023-05-18T10:40:52Z) - APANet: Adaptive Prototypes Alignment Network for Few-Shot Semantic
Segmentation [56.387647750094466]
Few-shot semantic segmentation aims to segment novel-class objects in a given query image with only a few labeled support images.
Most advanced solutions exploit a metric learning framework that performs segmentation through matching each query feature to a learned class-specific prototype.
We present an adaptive prototype representation by introducing class-specific and class-agnostic prototypes.
arXiv Detail & Related papers (2021-11-24T04:38:37Z) - Learning Debiased and Disentangled Representations for Semantic
Segmentation [52.35766945827972]
We propose a model-agnostic and training scheme for semantic segmentation.
By randomly eliminating certain class information in each training iteration, we effectively reduce feature dependencies among classes.
Models trained with our approach demonstrate strong results on multiple semantic segmentation benchmarks.
arXiv Detail & Related papers (2021-10-31T16:15:09Z) - Dynamic Semantic Matching and Aggregation Network for Few-shot Intent
Detection [69.2370349274216]
Few-shot Intent Detection is challenging due to the scarcity of available annotated utterances.
Semantic components are distilled from utterances via multi-head self-attention.
Our method provides a comprehensive matching measure to enhance representations of both labeled and unlabeled instances.
arXiv Detail & Related papers (2020-10-06T05:16:38Z) - Commonality-Parsing Network across Shape and Appearance for Partially
Supervised Instance Segmentation [71.59275788106622]
We propose to learn the underlying class-agnostic commonalities that can be generalized from mask-annotated categories to novel categories.
Our model significantly outperforms the state-of-the-art methods on both partially supervised setting and few-shot setting for instance segmentation on COCO dataset.
arXiv Detail & Related papers (2020-07-24T07:23:44Z) - Description Based Text Classification with Reinforcement Learning [34.18824470728299]
We propose a new framework for text classification, in which each category label is associated with a category description.
We observe significant performance boosts over strong baselines on a wide range of text classification tasks.
arXiv Detail & Related papers (2020-02-08T02:14:28Z) - Don't Judge an Object by Its Context: Learning to Overcome Contextual
Bias [113.44471186752018]
Existing models often leverage co-occurrences between objects and their context to improve recognition accuracy.
This work focuses on addressing such contextual biases to improve the robustness of the learnt feature representations.
arXiv Detail & Related papers (2020-01-09T18:31:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.