Related papers: Category-Extensible Out-of-Distribution Detection via Hierarchical Context Descriptions

Category-Extensible Out-of-Distribution Detection via Hierarchical Context Descriptions

URL: http://arxiv.org/abs/2407.16725v2
Date: Thu, 14 Nov 2024 07:15:06 GMT
Title: Category-Extensible Out-of-Distribution Detection via Hierarchical Context Descriptions
Authors: Kai Liu, Zhihang Fu, Chao Chen, Sheng Jin, Ze Chen, Mingyuan Tao, Rongxin Jiang, Jieping Ye,
Abstract summary: This work introduces two hierarchical contexts, namely perceptual context and spurious context, to carefully describe the precise category boundary. The two contexts hierarchically construct the precise description for a certain category, which is first roughly classifying a sample to the predicted category. The precise descriptions for those categories within the vision-language framework present a novel application: CATegory-EXtensible OOD detection (CATEX)
Score: 35.20091752343433
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The key to OOD detection has two aspects: generalized feature representation and precise category description. Recently, vision-language models such as CLIP provide significant advances in both two issues, but constructing precise category descriptions is still in its infancy due to the absence of unseen categories. This work introduces two hierarchical contexts, namely perceptual context and spurious context, to carefully describe the precise category boundary through automatic prompt tuning. Specifically, perceptual contexts perceive the inter-category difference (e.g., cats vs apples) for current classification tasks, while spurious contexts further identify spurious (similar but exactly not) OOD samples for every single category (e.g., cats vs panthers, apples vs peaches). The two contexts hierarchically construct the precise description for a certain category, which is, first roughly classifying a sample to the predicted category and then delicately identifying whether it is truly an ID sample or actually OOD. Moreover, the precise descriptions for those categories within the vision-language framework present a novel application: CATegory-EXtensible OOD detection (CATEX). One can efficiently extend the set of recognizable categories by simply merging the hierarchical contexts learned under different sub-task settings. And extensive experiments are conducted to demonstrate CATEX's effectiveness, robustness, and category-extensibility. For instance, CATEX consistently surpasses the rivals by a large margin with several protocols on the challenging ImageNet-1K dataset. In addition, we offer new insights on how to efficiently scale up the prompt engineering in vision-language models to recognize thousands of object categories, as well as how to incorporate large language models (like GPT-3) to boost zero-shot applications. Code is publicly available at https://github.com/alibaba/catex.

Related papers

Dual Prompt Learning for Adapting Vision-Language Models to Downstream Image-Text Retrieval [23.472806734625774]
We propose Dual prompt Learning with Joint Category-Attribute Reweighting (DCAR) to achieve precise image-text matching.<n>Based on the prompt paradigm, DCAR jointly optimize attribute and class features to enhance fine-grained representation learning.
arXiv Detail & Related papers (2025-08-06T02:44:08Z)
AlignCAT: Visual-Linguistic Alignment of Category and Attributefor Weakly Supervised Visual Grounding [51.74170851840497]
Weakly supervised visual grounding aims to locate objects in images based on text descriptions.<n>Existing methods lack strong cross-modal reasoning to distinguish subtle semantic differences in text expressions.<n>We introduce AlignCAT, a novel query-based semantic matching framework for weakly supervised VG.
arXiv Detail & Related papers (2025-08-05T08:16:35Z)
SDVPT: Semantic-Driven Visual Prompt Tuning for Open-World Object Counting [70.49268117587562]
We propose a plug-and-play Semantic-Driven Visual Prompt Tuning framework (SDVPT) that transfers knowledge from the training set to unseen categories. During inference, we dynamically synthesize the visual prompts for unseen categories based on the semantic correlation between unseen and training categories.
arXiv Detail & Related papers (2025-04-24T09:31:08Z)
Category-Adaptive Cross-Modal Semantic Refinement and Transfer for Open-Vocabulary Multi-Label Recognition [59.203152078315235]
We propose a novel category-adaptive cross-modal semantic refinement and transfer (C$2$SRT) framework to explore the semantic correlation. The proposed framework consists of two complementary modules, i.e., intra-category semantic refinement (ISR) module and inter-category semantic transfer (IST) module. Experiments on OV-MLR benchmarks clearly demonstrate that the proposed C$2$SRT framework outperforms current state-of-the-art algorithms.
arXiv Detail & Related papers (2024-12-09T04:00:18Z)
Label-Guided Prompt for Multi-label Few-shot Aspect Category Detection [12.094529796168384]
The representation of sentences and categories is a key issue in this task. We propose a label-guided prompt method to represent sentences and categories. Our method outperforms current state-of-the-art methods with a 3.86% - 4.75% improvement in the Macro-F1 score.
arXiv Detail & Related papers (2024-07-30T09:11:17Z)
AttrSeg: Open-Vocabulary Semantic Segmentation via Attribute Decomposition-Aggregation [33.25304533086283]
Open-vocabulary semantic segmentation is a challenging task that requires segmenting novel object categories at inference time. Recent studies have explored vision-language pre-training to handle this task, but suffer from unrealistic assumptions in practical scenarios. This work proposes a novel attribute decomposition-aggregation framework, AttrSeg, inspired by human cognition in understanding new concepts.
arXiv Detail & Related papers (2023-08-31T19:34:09Z)
Semantic Guided Level-Category Hybrid Prediction Network for Hierarchical Image Classification [8.456482280676884]
Hierarchical classification (HC) assigns each object with multiple labels organized into a hierarchical structure. We propose a novel semantic guided level-category hybrid prediction network (SGLCHPN) that can jointly perform the level and category prediction in an end-to-end manner.
arXiv Detail & Related papers (2022-11-22T13:49:10Z)
Comparison Knowledge Translation for Generalizable Image Classification [31.530232003512957]
We build a generalizable framework that emulates the humans' recognition mechanism in the image classification task. We put forward a Comparison Classification Translation Network (CCT-Net), which comprises a comparison classifier and a matching discriminator. CCT-Net achieves surprising generalization ability on unseen categories and SOTA performance on target categories.
arXiv Detail & Related papers (2022-05-07T11:05:18Z)
Semantic Representation and Dependency Learning for Multi-Label Image Recognition [76.52120002993728]
We propose a novel and effective semantic representation and dependency learning (SRDL) framework to learn category-specific semantic representation for each category. Specifically, we design a category-specific attentional regions (CAR) module to generate channel/spatial-wise attention matrices to guide model. We also design an object erasing (OE) module to implicitly learn semantic dependency among categories by erasing semantic-aware regions.
arXiv Detail & Related papers (2022-04-08T00:55:15Z)
Out-of-Category Document Identification Using Target-Category Names as Weak Supervision [64.671654559798]
Out-of-category detection aims to distinguish documents according to their semantic relevance to the inlier (or target) categories. We present an out-of-category detection framework, which effectively measures how confidently each document belongs to one of the target categories.
arXiv Detail & Related papers (2021-11-24T21:01:25Z)
Category Contrast for Unsupervised Domain Adaptation in Visual Tasks [92.9990560760593]
We propose a novel Category Contrast technique (CaCo) that introduces semantic priors on top of instance discrimination for visual UDA tasks. CaCo is complementary to existing UDA methods and generalizable to other learning setups such as semi-supervised learning, unsupervised model adaptation, etc.
arXiv Detail & Related papers (2021-06-05T12:51:35Z)
Towards Novel Target Discovery Through Open-Set Domain Adaptation [73.81537683043206]
Open-set domain adaptation (OSDA) considers that the target domain contains samples from novel categories unobserved in external source domain. We propose a novel framework to accurately identify the seen categories in target domain, and effectively recover the semantic attributes for unseen categories.
arXiv Detail & Related papers (2021-05-06T04:22:29Z)
DeepCAT: Deep Category Representation for Query Understanding in E-commerce Search [15.041444067591007]
We propose a deep learning model, DeepCAT, which learns joint word-category representations to enhance the query understanding process. Our results show that DeepCAT reaches a 10% improvement on em minority classes and a 7.1% improvement on em tail queries over a state-of-the-art label embedding model.
arXiv Detail & Related papers (2021-04-23T18:04:44Z)
Exploring the Hierarchy in Relation Labels for Scene Graph Generation [75.88758055269948]
The proposed method can improve several state-of-the-art baselines by a large margin (up to $33%$ relative gain) in terms of Recall@50. Experiments show that the proposed simple yet effective method can improve several state-of-the-art baselines by a large margin.
arXiv Detail & Related papers (2020-09-12T17:36:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.