Category-Extensible Out-of-Distribution Detection via Hierarchical Context Descriptions
- URL: http://arxiv.org/abs/2407.16725v1
- Date: Tue, 23 Jul 2024 12:53:38 GMT
- Title: Category-Extensible Out-of-Distribution Detection via Hierarchical Context Descriptions
- Authors: Kai Liu, Zhihang Fu, Chao Chen, Sheng Jin, Ze Chen, Mingyuan Tao, Rongxin Jiang, Jieping Ye,
- Abstract summary: This work introduces two hierarchical contexts, namely perceptual context and spurious context, to carefully describe the precise category boundary.
The two contexts hierarchically construct the precise description for a certain category, which is first roughly classifying a sample to the predicted category.
The precise descriptions for those categories within the vision-language framework present a novel application: CATegory-EXtensible OOD detection (CATEX)
- Score: 35.20091752343433
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The key to OOD detection has two aspects: generalized feature representation and precise category description. Recently, vision-language models such as CLIP provide significant advances in both two issues, but constructing precise category descriptions is still in its infancy due to the absence of unseen categories. This work introduces two hierarchical contexts, namely perceptual context and spurious context, to carefully describe the precise category boundary through automatic prompt tuning. Specifically, perceptual contexts perceive the inter-category difference (e.g., cats vs apples) for current classification tasks, while spurious contexts further identify spurious (similar but exactly not) OOD samples for every single category (e.g., cats vs panthers, apples vs peaches). The two contexts hierarchically construct the precise description for a certain category, which is, first roughly classifying a sample to the predicted category and then delicately identifying whether it is truly an ID sample or actually OOD. Moreover, the precise descriptions for those categories within the vision-language framework present a novel application: CATegory-EXtensible OOD detection (CATEX). One can efficiently extend the set of recognizable categories by simply merging the hierarchical contexts learned under different sub-task settings. And extensive experiments are conducted to demonstrate CATEX's effectiveness, robustness, and category-extensibility. For instance, CATEX consistently surpasses the rivals by a large margin with several protocols on the challenging ImageNet-1K dataset. In addition, we offer new insights on how to efficiently scale up the prompt engineering in vision-language models to recognize thousands of object categories, as well as how to incorporate large language models (like GPT-3) to boost zero-shot applications. Code will be made public soon.
Related papers
- Label-Guided Prompt for Multi-label Few-shot Aspect Category Detection [12.094529796168384]
The representation of sentences and categories is a key issue in this task.
We propose a label-guided prompt method to represent sentences and categories.
Our method outperforms current state-of-the-art methods with a 3.86% - 4.75% improvement in the Macro-F1 score.
arXiv Detail & Related papers (2024-07-30T09:11:17Z) - AttrSeg: Open-Vocabulary Semantic Segmentation via Attribute
Decomposition-Aggregation [33.25304533086283]
Open-vocabulary semantic segmentation is a challenging task that requires segmenting novel object categories at inference time.
Recent studies have explored vision-language pre-training to handle this task, but suffer from unrealistic assumptions in practical scenarios.
This work proposes a novel attribute decomposition-aggregation framework, AttrSeg, inspired by human cognition in understanding new concepts.
arXiv Detail & Related papers (2023-08-31T19:34:09Z) - Semantic Guided Level-Category Hybrid Prediction Network for
Hierarchical Image Classification [8.456482280676884]
Hierarchical classification (HC) assigns each object with multiple labels organized into a hierarchical structure.
We propose a novel semantic guided level-category hybrid prediction network (SGLCHPN) that can jointly perform the level and category prediction in an end-to-end manner.
arXiv Detail & Related papers (2022-11-22T13:49:10Z) - Comparison Knowledge Translation for Generalizable Image Classification [31.530232003512957]
We build a generalizable framework that emulates the humans' recognition mechanism in the image classification task.
We put forward a Comparison Classification Translation Network (CCT-Net), which comprises a comparison classifier and a matching discriminator.
CCT-Net achieves surprising generalization ability on unseen categories and SOTA performance on target categories.
arXiv Detail & Related papers (2022-05-07T11:05:18Z) - Semantic Representation and Dependency Learning for Multi-Label Image
Recognition [76.52120002993728]
We propose a novel and effective semantic representation and dependency learning (SRDL) framework to learn category-specific semantic representation for each category.
Specifically, we design a category-specific attentional regions (CAR) module to generate channel/spatial-wise attention matrices to guide model.
We also design an object erasing (OE) module to implicitly learn semantic dependency among categories by erasing semantic-aware regions.
arXiv Detail & Related papers (2022-04-08T00:55:15Z) - Out-of-Category Document Identification Using Target-Category Names as
Weak Supervision [64.671654559798]
Out-of-category detection aims to distinguish documents according to their semantic relevance to the inlier (or target) categories.
We present an out-of-category detection framework, which effectively measures how confidently each document belongs to one of the target categories.
arXiv Detail & Related papers (2021-11-24T21:01:25Z) - Visual Boundary Knowledge Translation for Foreground Segmentation [57.32522585756404]
We make an attempt towards building models that explicitly account for visual boundary knowledge, in hope to reduce the training effort on segmenting unseen categories.
With only tens of labeled samples as guidance, Trans-Net achieves close results on par with fully supervised methods.
arXiv Detail & Related papers (2021-08-01T07:10:25Z) - Category Contrast for Unsupervised Domain Adaptation in Visual Tasks [92.9990560760593]
We propose a novel Category Contrast technique (CaCo) that introduces semantic priors on top of instance discrimination for visual UDA tasks.
CaCo is complementary to existing UDA methods and generalizable to other learning setups such as semi-supervised learning, unsupervised model adaptation, etc.
arXiv Detail & Related papers (2021-06-05T12:51:35Z) - Towards Novel Target Discovery Through Open-Set Domain Adaptation [73.81537683043206]
Open-set domain adaptation (OSDA) considers that the target domain contains samples from novel categories unobserved in external source domain.
We propose a novel framework to accurately identify the seen categories in target domain, and effectively recover the semantic attributes for unseen categories.
arXiv Detail & Related papers (2021-05-06T04:22:29Z) - DeepCAT: Deep Category Representation for Query Understanding in
E-commerce Search [15.041444067591007]
We propose a deep learning model, DeepCAT, which learns joint word-category representations to enhance the query understanding process.
Our results show that DeepCAT reaches a 10% improvement on em minority classes and a 7.1% improvement on em tail queries over a state-of-the-art label embedding model.
arXiv Detail & Related papers (2021-04-23T18:04:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.