SimLabel: Consistency-Guided OOD Detection with Pretrained Vision-Language Models
- URL: http://arxiv.org/abs/2501.11485v1
- Date: Mon, 20 Jan 2025 13:36:30 GMT
- Title: SimLabel: Consistency-Guided OOD Detection with Pretrained Vision-Language Models
- Authors: Shu Zou, Xinyu Tian, Qinyu Zhao, Zhaoyuan Yang, Jing Zhang,
- Abstract summary: We investigate the ability of image-text comprehension among different semantic-related ID labels in vision-language models (VLMs)
We propose a novel post-hoc strategy called SimLabel to enhance the separability between ID and out-of-distribution (OOD) samples.
Our experiments demonstrate the superior performance of SimLabel on various zero-shot OOD detection benchmarks.
- Score: 7.90233294809002
- License:
- Abstract: Detecting out-of-distribution (OOD) data is crucial in real-world machine learning applications, particularly in safety-critical domains. Existing methods often leverage language information from vision-language models (VLMs) to enhance OOD detection by improving confidence estimation through rich class-wise text information. However, when building OOD detection score upon on in-distribution (ID) text-image affinity, existing works either focus on each ID class or whole ID label sets, overlooking inherent ID classes' connection. We find that the semantic information across different ID classes is beneficial for effective OOD detection. We thus investigate the ability of image-text comprehension among different semantic-related ID labels in VLMs and propose a novel post-hoc strategy called SimLabel. SimLabel enhances the separability between ID and OOD samples by establishing a more robust image-class similarity metric that considers consistency over a set of similar class labels. Extensive experiments demonstrate the superior performance of SimLabel on various zero-shot OOD detection benchmarks. The proposed model is also extended to various VLM-backbones, demonstrating its good generalization ability. Our demonstration and implementation codes are available at: https://github.com/ShuZou-1/SimLabel.
Related papers
- TagFog: Textual Anchor Guidance and Fake Outlier Generation for Visual Out-of-Distribution Detection [34.31570050254269]
Out-of-distribution (OOD) detection is crucial in many real-world applications.
We propose a new learning framework which leverage simple Jigsaw-based fake OOD data and rich semantic embeddings (anchors') from the ChatGPT description of ID knowledge to help guide the training of the image encoder.
arXiv Detail & Related papers (2024-11-22T14:40:25Z) - Conjugated Semantic Pool Improves OOD Detection with Pre-trained Vision-Language Models [70.82728812001807]
A straightforward pipeline for zero-shot out-of-distribution (OOD) detection involves selecting potential OOD labels from an extensive semantic pool.
We theorize that enhancing performance requires expanding the semantic pool.
We show that expanding OOD label candidates with the CSP satisfies the requirements and outperforms existing works by 7.89% in FPR95.
arXiv Detail & Related papers (2024-10-11T08:24:11Z) - Zero-Shot Out-of-Distribution Detection with Outlier Label Exposure [23.266183020469065]
Outlier Label Exposure (OLE) is an approach to enhance zero-shot OOD detection using auxiliary outlier class labels.
OLE substantially improves detection performance and achieves new state-of-the-art performance in large-scale OOD and hard OOD detection benchmarks.
arXiv Detail & Related papers (2024-06-03T10:07:21Z) - Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection [71.93411099797308]
Out-of-distribution (OOD) samples are crucial when deploying machine learning models in open-world scenarios.
We propose to tackle this constraint by leveraging the expert knowledge and reasoning capability of large language models (LLM) to potential Outlier Exposure, termed EOE.
EOE can be generalized to different tasks, including far, near, and fine-language OOD detection.
EOE achieves state-of-the-art performance across different OOD tasks and can be effectively scaled to the ImageNet-1K dataset.
arXiv Detail & Related papers (2024-06-02T17:09:48Z) - Learning Transferable Negative Prompts for Out-of-Distribution Detection [22.983892817676495]
We introduce a novel OOD detection method, named 'NegPrompt', to learn a set of negative prompts.
It learns such negative prompts with ID data only, without any reliance on external outlier data.
Experiments on various ImageNet benchmarks show that NegPrompt surpasses state-of-the-art prompt-learning-based OOD detection methods.
arXiv Detail & Related papers (2024-04-04T07:07:34Z) - Negative Label Guided OOD Detection with Pretrained Vision-Language Models [96.67087734472912]
Out-of-distribution (OOD) detection aims at identifying samples from unknown classes.
We propose a novel post hoc OOD detection method, called NegLabel, which takes a vast number of negative labels from extensive corpus databases.
arXiv Detail & Related papers (2024-03-29T09:19:52Z) - Data-free Multi-label Image Recognition via LLM-powered Prompt Tuning [23.671999163027284]
This paper proposes a novel framework for multi-label image recognition without any training data.
It uses knowledge of pre-trained Large Language Model to learn prompts to adapt pretrained Vision-Language Model like CLIP to multilabel classification.
Our framework presents a new way to explore the synergies between multiple pre-trained models for novel category recognition.
arXiv Detail & Related papers (2024-03-02T13:43:32Z) - Exploring Large Language Models for Multi-Modal Out-of-Distribution
Detection [67.68030805755679]
Large language models (LLMs) encode a wealth of world knowledge and can be prompted to generate descriptive features for each class.
In this paper, we propose to apply world knowledge to enhance OOD detection performance through selective generation from LLMs.
arXiv Detail & Related papers (2023-10-12T04:14:28Z) - From Global to Local: Multi-scale Out-of-distribution Detection [129.37607313927458]
Out-of-distribution (OOD) detection aims to detect "unknown" data whose labels have not been seen during the in-distribution (ID) training process.
Recent progress in representation learning gives rise to distance-based OOD detection.
We propose Multi-scale OOD DEtection (MODE), a first framework leveraging both global visual information and local region details.
arXiv Detail & Related papers (2023-08-20T11:56:25Z) - Open-Vocabulary Multi-Label Classification via Multi-modal Knowledge
Transfer [55.885555581039895]
Multi-label zero-shot learning (ML-ZSL) focuses on transferring knowledge by a pre-trained textual label embedding.
We propose a novel open-vocabulary framework, named multimodal knowledge transfer (MKT) for multi-label classification.
arXiv Detail & Related papers (2022-07-05T08:32:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.